[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #50 from Thomas Koenig --- Author: tkoenig Date: Sun Jun 2 15:18:22 2019 New Revision: 271844 URL: https://gcc.gnu.org/viewcvs?rev=271844=gcc=rev Log: 2019-06-02 Thomas Koenig PR fortran/90539 * trans-expr.c (gfc_conv_subref_array_arg): If the size of the expression can be determined to be one, treat it as contiguous. Set likelyhood of presence of an actual argument according to PRED_FORTRAN_ABSENT_DUMMY and likelyhood of being contiguous according to PRED_FORTRAN_CONTIGUOUS. 2019-06-02 Thomas Koenig PR fortran/90539 * predict.def (PRED_FORTRAN_CONTIGUOUS): New predictor. 2019-06-02 Thomas Koenig PR fortran/90539 * gfortran.dg/internal_pack_24.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/internal_pack_24.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/trans-expr.c trunk/gcc/predict.def trunk/gcc/testsuite/ChangeLog
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #49 from Martin Liška --- (In reply to Martin Liška from comment #48) > I see the performance is back as seen here: > https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=21.270.0 > > -Ofast periodic tester hasn't finished yet, but I would close the PR. > Thank you Thomas! -Ofast -march native is also fine: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=23.270.0
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Martin Liška changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #48 from Martin Liška --- I see the performance is back as seen here: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=21.270.0 -Ofast periodic tester hasn't finished yet, but I would close the PR. Thank you Thomas!
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Thomas Koenig changed: What|Removed |Added Status|ASSIGNED|WAITING --- Comment #47 from Thomas Koenig --- Waiting for feedback on the speed issue.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #46 from Thomas Koenig --- Let's see if the failures go away (they should) and also what the performance impact is now.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #45 from Thomas Koenig --- Author: tkoenig Date: Wed May 29 20:30:45 2019 New Revision: 271751 URL: https://gcc.gnu.org/viewcvs?rev=271751=gcc=rev Log: 2019-05-29 Thomas Koenig PR fortran/90539 * gfortran.h (gfc_has_dimen_vector_ref): Add prototype. * trans.h (gfc_conv_subref_array_arg): Add argument check_contiguous. (gfc_conv_is_contiguous_expr): Add prototype. * frontend-passes.c (has_dimen_vector_ref): Remove prototype, rename to (gfc_has_dimen_vector_ref): New function name. (matmul_temp_args): Use gfc_has_dimen_vector_ref. (inline_matmul_assign): Likewise. * trans-array.c (gfc_conv_array_parameter): Also check for absence of a vector subscript before calling gfc_conv_subref_array_arg. Pass additional argument to gfc_conv_subref_array_arg. * trans-expr.c (gfc_conv_subref_array_arg): Add argument check_contiguous. If that is true, check if the argument is contiguous and do not repack in that case. * trans-intrinsic.c (gfc_conv_intrinsic_is_contiguous): Split away most of the work into, and call (gfc_conv_intrinsic_is_coniguous_expr): New function. 2019-05-29 Thomas Koenig PR fortran/90539 * gfortran.dg/internal_pack_21.f90: Adjust scan patterns. * gfortran.dg/internal_pack_22.f90: New test. * gfortran.dg/internal_pack_23.f90: New test. Added: trunk/gcc/testsuite/gfortran.dg/internal_pack_22.f90 trunk/gcc/testsuite/gfortran.dg/internal_pack_23.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/frontend-passes.c trunk/gcc/fortran/gfortran.h trunk/gcc/fortran/trans-array.c trunk/gcc/fortran/trans-expr.c trunk/gcc/fortran/trans-intrinsic.c trunk/gcc/fortran/trans.h trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gfortran.dg/internal_pack_21.f90
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Thomas Koenig changed: What|Removed |Added Keywords||patch, wrong-code --- Comment #44 from Thomas Koenig --- Patch here: https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01882.html
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #43 from Martin Liška --- (In reply to Thomas Koenig from comment #42) > Created attachment 46428 [details] > Patch which should finally work > > So, this does not regress, apparently. > > Martin, could you give this one a shot? I can verify that segmentation faults for both benchmarks are gone. Thank you very much!
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Thomas Koenig changed: What|Removed |Added Attachment #46427|0 |1 is obsolete|| --- Comment #42 from Thomas Koenig --- Created attachment 46428 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46428=edit Patch which should finally work So, this does not regress, apparently. Martin, could you give this one a shot?
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #41 from Thomas Koenig --- Just noticed that this causes a regression in gfortran.fortran-torture/execute/arrayarg.f90 , but only at certain optimization levels. Oh well... need to look some more.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Thomas Koenig changed: What|Removed |Added Attachment #46420|0 |1 is obsolete|| --- Comment #40 from Thomas Koenig --- Created attachment 46427 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46427=edit Updated patch OK, so this patch fixes the shortened test case and netcdf. It is basically the earlier one with two lines interchanged. The idea of the patch is simple: Do the same as the library version and don't repack if the array in question is already contiguous Martin, can you check if that this fixes the SPEC problem, too? If so, we can commit and then worry about fine-tuning of when to use this and when to use the library version. I could imagine that, for a procedure with very many arguments, using a library function could be a win because the inlined version would use more icache.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 kargl at gcc dot gnu.org changed: What|Removed |Added CC||kargl at gcc dot gnu.org --- Comment #39 from kargl at gcc dot gnu.org --- (In reply to Thomas Koenig from comment #38) > So, I finally have a self-contained test case: > > module t2 > implicit none > contains > subroutine foo(a) > real, dimension(*) :: a > end subroutine foo > end module t2 > module t1 > use t2 > implicit none > contains > subroutine bar(a) > real, dimension(:) :: a > call foo(a) > end subroutine bar > end module t1 > > program main > use t1 > call bar([1.0, 2.0]) > end program main This looks an optimizer bug. Compiling with -fdump-tree-original -fdump-tree-optimize -O gives (in a.f90.004t.original) MAIN__ () { { static real(kind=4) A.5[2] = {1.0e+0, 2.0e+0}; struct array01_real(kind=4) parm.6; parm.6.span = 4; parm.6.dtype = {.elem_len=4, .rank=1, .type=3}; parm.6.dim[0].lbound = 0; parm.6.dim[0].ubound = 1; parm.6.dim[0].stride = 1; parm.6.data = (void *) [0]; parm.6.offset = 0; bar (); } } (in a.f90.231t.optimized) main (integer(kind=4) argc, character(kind=1) * * argv) { struct array01_real(kind=4) parm.9; static integer(kind=4) options.10[7] = {2116, 4095, 0, 0, 1, 0, 31}; [local count: 1073741824]: _gfortran_set_args (argc_2(D), argv_3(D)); _gfortran_set_options (7, [0]); # DEBUG INLINE_ENTRY MAIN__ parm.9.span = 4; MEM[(struct dtype_type *) + 24B] = {}; parm.9.dtype.elem_len = 4; parm.9.dtype.rank = 1; parm.9.dtype.type = 3; parm.9.dim[0].lbound = 0; parm.9.dim[0].ubound = 1; parm.9.dim[0].stride = 1; parm.9.data = [0]; parm.9.offset = 0; bar (); parm.9 ={v} {CLOBBER}; return 0; } Note 'static real(kind=4) A.5[2] = {1.0e+0, 2.0e+0};' in *original appears to be A.8 in *.optimized, but the static declaration is gone. Perhaps, the Fortran FE needs to mark that actual arguments as "used" by gfc_mark_ss_chain_used() or TREE_USED().
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #38 from Thomas Koenig --- So, I finally have a self-contained test case: module t2 implicit none contains subroutine foo(a) real, dimension(*) :: a end subroutine foo end module t2 module t1 use t2 implicit none contains subroutine bar(a) real, dimension(:) :: a call foo(a) end subroutine bar end module t1 program main use t1 call bar([1.0, 2.0]) end program main
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #37 from Thomas Koenig --- Hm, with that patch, there still seems to be a failure in netcdf :-( I will keep looking (possibly some small problem with the patch).
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #36 from Thomas Koenig --- ... which should be Index: testsuite/gfortran.dg/internal_pack_21.f90 === --- testsuite/gfortran.dg/internal_pack_21.f90 (Revision 271629) +++ testsuite/gfortran.dg/internal_pack_21.f90 (Arbeitskopie) @@ -20,5 +20,5 @@ USE M1 CALL S2() END -! { dg-final { scan-tree-dump-times "optional" 4 "original" } } +! { dg-final { scan-tree-dump-times "arg_ptr" 5 "original" } } ! { dg-final { scan-tree-dump-not "_gfortran_internal_unpack" "original" } } ig25@linux-p51k:~/Gcc/trunk/gcc>
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #35 from Thomas Koenig --- (In reply to Thomas Koenig from comment #34) > Created attachment 46420 [details] > Patch which includes a check for being contiguous > > This patch looks like it could do the job. I'll have to work a bit > more on test cases and ChangeLog before I can submit this, but > at least it survives regression testing. ... except for a tree dump scan. I will look at this later.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #34 from Thomas Koenig --- Created attachment 46420 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46420=edit Patch which includes a check for being contiguous This patch looks like it could do the job. I'll have to work a bit more on test cases and ChangeLog before I can submit this, but at least it survives regression testing.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #33 from Martin Liška --- (In reply to Thomas Koenig from comment #32) > Hi Martin, > > this > > 3822 ierr = pio_put_var (tape(t)%File, ps0var, (/ps0/)) > > looks like the culprit (or rather, where gfortran currently > generates wrong code). This is consistent with the problem seen > in netcdf, so I feel pretty confident that this is indeed the problem. > > To double-check, could you maybe do the following? Assume ps0 is a > real(kind=8) variable, do > > ... > >real(kind=8) :: ps0_array(1) ! Use the type as ps0 > > and then > > ps0_array(1) = ps0 > ierr = pio_put_var (tape(t)%File, ps0var, ps0_array) > > and see if the segfault goes away, or at least if this one has > been removed, and there is a different one now :-) Yes, I can confirm it helps. I see a segfault later then. Thank you.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #32 from Thomas Koenig --- Hi Martin, this 3822 ierr = pio_put_var (tape(t)%File, ps0var, (/ps0/)) looks like the culprit (or rather, where gfortran currently generates wrong code). This is consistent with the problem seen in netcdf, so I feel pretty confident that this is indeed the problem. To double-check, could you maybe do the following? Assume ps0 is a real(kind=8) variable, do ... real(kind=8) :: ps0_array(1) ! Use the type as ps0 and then ps0_array(1) = ps0 ierr = pio_put_var (tape(t)%File, ps0var, ps0_array) and see if the segfault goes away, or at least if this one has been removed, and there is a different one now :-)
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #31 from Martin Liška --- I see this: (gdb) frame #2 0x00453b06 in pionfput_mod::put_var_vdesc_1d_double (file=..., vardesc=..., ival=...) at pionfput_mod.fppized.f90:2468 2468ierr = put_var_1d_double (File, vardesc%varid, ival) (gdb) up #3 0x005be633 in cam_history::h_define (restart=4294949912, t=-17380) at cam_history.fppized.f90:3822 3822 ierr = pio_put_var (tape(t)%File, ps0var, (/ps0/)) (gdb) up #4 cam_history::wshist (rgnht_in=) at cam_history.fppized.f90:4461 4461 call h_define (t, restart) (gdb) up #5 0x007811dc in cam_comp::cam_run4 (cam_out=..., cam_in=..., rstwr=.FALSE., nlend=.FALSE., yr_spec=0, mon_spec=1, day_spec=1, sec_spec=1800) at cam_comp.fppized.f90:325 325call wshist () (gdb) up #6 0x0079d809 in atm_comp_mct::atm_run_mct (eclock=..., cdata_a=..., x2a_a=..., a2x_a=...) at atm_comp_mct.fppized.f90:513 513 yr_spec=yr_sync, mon_spec=mon_sync, day_spec=day_sync, sec_spec=tod_sync) (gdb) up #7 0x007deb03 in ccsm_comp_mod::ccsm_run () at ccsm_comp_mod.fppized.f90:2408 2408 call atm_run_mct( EClock_a, cdata_aa, x2a_aa, a2x_aa) (gdb) up #8 0x00403772 in ccsm_driver () at ccsm_driver.fppized.f90:58 58 call ccsm_run() (gdb) up #9 main (argc=argc@entry=1, argv=0x7fffdffe) at ccsm_driver.fppized.f90:25 25 use shr_sys_mod, only: shr_sys_abort (gdb) up #10 0x779b6b7b in __libc_start_main (main=0x403740 , argc=1, argv=0x7fffdb98, init=, fini=, rtld_fini=, stack_end=0x7fffdb88) at ../csu/libc-start.c:308 308 result = main (argc, argv, __environ MAIN_AUXVEC_PARAM); (gdb) up #11 0x004037ba in _start () at ../sysdeps/x86_64/start.S:120 120 ../sysdeps/x86_64/start.S: No such file or directory.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #30 from Thomas Koenig --- Hi, what I mean is if you use "up" several times and list the source of the calling routines, do you encounter something like call foo([1.0, 2.0, 3.0, 4.0]) or call foo((/1.0, 2.0, 3.0, 4.0/)) ? This is what I see for netcdf, and then I can also understand what goes wrong. Such an array constructor would be in read-only memory, and the current version would try to write back to it on exit - ouch :-)
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #29 from Martin Liška --- (In reply to Thomas Koenig from comment #28) > https://gcc.gnu.org/ml/fortran/2019-05/msg00173.html reports > the same symptoms for netcdf-fortran-4.4.5, presumably due > to the same issue. > > I'll try to fix that one and then see if the SPEC failure disappears > along with it. > > Martin, one additional question: When you step up from the segfault > in the executable, is an array constructor passed as an argument > somewhere up the call chain? This is what appears to cause the trouble > int netcdf. How can I investigate that? Backtrace: #0 0x008f706c in netcdf::nf90_put_var_1d_eightbytereal (ncid=7, varid=23, values=..., start=, count=, stride=, map=...) at netcdf_expanded.f90:1471 #1 0x00453a94 in pionfput_mod::put_var_1d_double (file=..., varid=23, ival=...) at pionfput_mod.fppized.f90:1476 #2 0x00453b06 in pionfput_mod::put_var_vdesc_1d_double (file=..., vardesc=..., ival=...) at pionfput_mod.fppized.f90:2468 #3 0x005be633 in cam_history::h_define (restart=4294949912, t=-17380) at cam_history.fppized.f90:3822 #4 cam_history::wshist (rgnht_in=) at cam_history.fppized.f90:4461 #5 0x007811dc in cam_comp::cam_run4 (cam_out=..., cam_in=..., rstwr=.FALSE., nlend=.FALSE., yr_spec=0, mon_spec=1, day_spec=1, sec_spec=1800) at cam_comp.fppized.f90:325 #6 0x0079d809 in atm_comp_mct::atm_run_mct (eclock=..., cdata_a=..., x2a_a=..., a2x_a=...) at atm_comp_mct.fppized.f90:513 #7 0x007deb03 in ccsm_comp_mod::ccsm_run () at ccsm_comp_mod.fppized.f90:2408 #8 0x00403772 in ccsm_driver () at ccsm_driver.fppized.f90:58 #9 main (argc=argc@entry=1, argv=0x7fffdffe) at ccsm_driver.fppized.f90:25 #10 0x779b6b7b in __libc_start_main (main=0x403740 , argc=1, argv=0x7fffdb98, init=, fini=, rtld_fini=, stack_end=0x7fffdb88) at ../csu/libc-start.c:308 #11 0x004037ba in _start () at ../sysdeps/x86_64/start.S:120 #2 0x00453b06 in pionfput_mod::put_var_vdesc_1d_double (file=..., vardesc=..., ival=...) at pionfput_mod.fppized.f90:2468 2468ierr = put_var_1d_double (File, vardesc%varid, ival) (gdb) info locals ierr =
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Thomas Koenig changed: What|Removed |Added Status|WAITING |ASSIGNED Assignee|unassigned at gcc dot gnu.org |tkoenig at gcc dot gnu.org --- Comment #28 from Thomas Koenig --- https://gcc.gnu.org/ml/fortran/2019-05/msg00173.html reports the same symptoms for netcdf-fortran-4.4.5, presumably due to the same issue. I'll try to fix that one and then see if the SPEC failure disappears along with it. Martin, one additional question: When you step up from the segfault in the executable, is an array constructor passed as an argument somewhere up the call chain? This is what appears to cause the trouble int netcdf.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Martin Liška changed: What|Removed |Added CC||seurer at gcc dot gnu.org --- Comment #27 from Martin Liška --- *** Bug 90619 has been marked as a duplicate of this bug. ***
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #26 from Thomas Koenig --- Author: tkoenig Date: Sun May 26 14:02:51 2019 New Revision: 271630 URL: https://gcc.gnu.org/viewcvs?rev=271630=gcc=rev Log: 2019-05-26 Thomas Koenig PR fortran/90539 * trans-types.c (get_formal_from_actual_arglist): Set rank and lower bound for assumed size arguments. Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/trans-types.c
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #25 from Martin Liška --- (In reply to Thomas Koenig from comment #22) > I've been trying out some things, and I cannot construct a failing > test case. > > A sane way to build such an interface would be > > cat tst.f90 > module x > use, intrinsic :: iso_c_binding, only : c_double > implicit none > interface > subroutine foo(a) bind(c) >import >real(kind=c_double) :: a(*) > end subroutine foo > end interface > private > public :: bar > > contains > subroutine bar(a) > real(kind=c_double), dimension(:) :: a > a = 42._c_double > call foo(a) > end subroutine bar > end module x > > program main > use, intrinsic :: iso_c_binding, only : c_double > use x > implicit none > real(kind=c_double), dimension(1) :: a > call bar(a) > end program main > $ cat foo.c > #include > > void foo (double *a) > { > printf("%f\n", *a); > } > $ gfortran -flto -O tst.f90 foo.c > $ ./a.out > 42.00 > > This works as expected. > > What I do not understand is (comment #17) > > (gdb) p debug(fsym) > || symbol: '_formal_107' > type spec : (REAL 8) > attributes: (VARIABLE DIMENSION DUMMY) > Array spec:(0 [0]) > > > This means that the dummy parameter has rank zero. How, then, > is it possible to pass a rank-1 argument to it? > > (gdb) p debug(expr) > nf90_put_var_1d_eightbytereal:values(FULL) (REAL 8) > > (gdb) p *expr->ref > $8 = { > type = REF_ARRAY, > u = { > ar = { > type = AR_FULL, > dimen = 1, > codimen = 0, > > Something very fishy going on here. > > Please look up the Fortran interface to the C function that is called, > nc_put_vara_double. > > Also, please break on gfc_conv_procedure_call for the call > in question and do > > $ call debug(sym) > $ p args > $ call debug(args->expr) > $ p args->next > $ call debug(args->next->expr) (gdb) call debug(sym) || symbol: 'nf_put_vara_double' type spec : (INTEGER 4) attributes: (PROCEDURE EXTERNAL-PROC IMPLICIT-SAVE EXTERNAL FUNCTION) result: nf_put_vara_double Formal arglist: _formal_103 _formal_104 _formal_105 _formal_106 _formal_107 (gdb) p args $4 = (gfc_actual_arglist *) 0x2a766f0 (gdb) call debug(args->expr) nf90_put_var_1d_eightbytereal:ncid (INTEGER 4) (gdb) p args->next $5 = (gfc_actual_arglist *) 0x2a72150 (gdb) call debug(args->next->expr) nf90_put_var_1d_eightbytereal:varid (INTEGER 4) (gdb) call debug(args->next->next->expr) nf90_put_var_1d_eightbytereal:localstart(FULL) (INTEGER 4) (gdb) call debug(args->next->next->next->expr) nf90_put_var_1d_eightbytereal:localcount(FULL) (INTEGER 4) (gdb) call debug(args->next->next->next->next->expr) nf90_put_var_1d_eightbytereal:values(FULL) (REAL 8) > > ... and so on, until args->...->next becomes a null pointer. > > I am starting do suspect that this is, in fact, another piece of SPEC > bugware where they made some sort of broken interface between C > and Fortran, which is exposed by my patch. That's likely :) Hope my remove gdb session helped. > > Hmpf...
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #24 from Martin Liška --- One another note is that the problematic code lives in src/netcdf/* and the same code contain: benchspec/CPU/521.wrf_r/src/netcdf/ and benchspec/CPU/628.pop2_s/src/netcdf/ So that would explain also the segfault of the wrf benchmark.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #23 from Martin Liška --- (In reply to Thomas Koenig from comment #21) > OK, if the callee is a C function... what is its declaration > on the Fortran side? Is there any interface, bind(c) or otherwise? > > I suppose there must be something, otherwise nf_put_vara_double would > have a trailing underscore. > > On the caller side, I see that an array is passed, but the fsym > has rank=0. I think this would be flagged otherwise. So ncfortran.h contains: #define nf_put_vara_double nf_put_vara_double_ And Fortran interface is defined in netcdf/include/netcdf.inc: integer nf_put_vara_double ! (integer ncid, ! integer varid, ! integer start(1), ! integer count(1), ! doubleprecision dvals(1)) externalnf_put_vara_double
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #22 from Thomas Koenig --- I've been trying out some things, and I cannot construct a failing test case. A sane way to build such an interface would be cat tst.f90 module x use, intrinsic :: iso_c_binding, only : c_double implicit none interface subroutine foo(a) bind(c) import real(kind=c_double) :: a(*) end subroutine foo end interface private public :: bar contains subroutine bar(a) real(kind=c_double), dimension(:) :: a a = 42._c_double call foo(a) end subroutine bar end module x program main use, intrinsic :: iso_c_binding, only : c_double use x implicit none real(kind=c_double), dimension(1) :: a call bar(a) end program main $ cat foo.c #include void foo (double *a) { printf("%f\n", *a); } $ gfortran -flto -O tst.f90 foo.c $ ./a.out 42.00 This works as expected. What I do not understand is (comment #17) (gdb) p debug(fsym) || symbol: '_formal_107' type spec : (REAL 8) attributes: (VARIABLE DIMENSION DUMMY) Array spec:(0 [0]) This means that the dummy parameter has rank zero. How, then, is it possible to pass a rank-1 argument to it? (gdb) p debug(expr) nf90_put_var_1d_eightbytereal:values(FULL) (REAL 8) (gdb) p *expr->ref $8 = { type = REF_ARRAY, u = { ar = { type = AR_FULL, dimen = 1, codimen = 0, Something very fishy going on here. Please look up the Fortran interface to the C function that is called, nc_put_vara_double. Also, please break on gfc_conv_procedure_call for the call in question and do $ call debug(sym) $ p args $ call debug(args->expr) $ p args->next $ call debug(args->next->expr) ... and so on, until args->...->next becomes a null pointer. I am starting do suspect that this is, in fact, another piece of SPEC bugware where they made some sort of broken interface between C and Fortran, which is exposed by my patch. Hmpf...
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #21 from Thomas Koenig --- OK, if the callee is a C function... what is its declaration on the Fortran side? Is there any interface, bind(c) or otherwise? I suppose there must be something, otherwise nf_put_vara_double would have a trailing underscore. On the caller side, I see that an array is passed, but the fsym has rank=0. I think this would be flagged otherwise.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #20 from Martin Liška --- (In reply to Thomas Koenig from comment #19) > Thanks. > > A bit more: > > What are the declarations of the actual srgument, > of the dummy argument (on the callee side), > and what is the argument in the call list? > > > Ill try to construct a test case tonight then. So the callee is actually a C function: ;; Function nf_put_vara_double_ (null) ;; enabled by -tree-original { size_t B3[512]; size_t B4[512]; int A0; # DEBUG BEGIN STMT; size_t B3[512]; # DEBUG BEGIN STMT; size_t B4[512]; # DEBUG BEGIN STMT; int A0; # DEBUG BEGIN STMT; A0 = nc_put_vara_double (*fncid, *fvarid + -1, (const size_t *) f2c_coords (*fncid, *fvarid + -1, (const int *) A3, (size_t *) ), (const size_t *) f2c_counts (*fncid, *fvarid + -1, (const int *) A4, (size_t *) ), A5); # DEBUG BEGIN STMT; return A0; } where nc_put_vara_double is defined as: int nc_put_vara_double(int ncid, int varid, const size_t *start, const size_t *edges, const double *value) { int status = NC_NOERR; NC *ncp; const NC_var *varp; int ii; size_t iocount; status = NC_check_id(ncid, ); if(status != NC_NOERR) return status; if(NC_readonly(ncp)) return NC_EPERM; if(NC_indef(ncp)) return NC_EINDEFINE; varp = NC_lookupvar(ncp, varid); if(varp == NULL) return NC_ENOTVAR; /* TODO: lost NC_EGLOBAL */ if(varp->type == NC_CHAR) return NC_ECHAR; status = NCcoordck(ncp, varp, start); if(status != NC_NOERR) return status; status = NCedgeck(ncp, varp, start, edges); if(status != NC_NOERR) return status; if(varp->ndims == 0) /* scalar variable */ { return( putNCv_double(ncp, varp, start, 1, value) ); } if(IS_RECVAR(varp)) { status = NCvnrecs(ncp, *start + *edges); if(status != NC_NOERR) return status; if(varp->ndims == 1 && ncp->recsize <= varp->len) { /* one dimensional && the only record variable */ return( putNCv_double(ncp, varp, start, *edges, value) ); } } /* * find max contiguous * and accumulate max count for a single io operation */ ii = NCiocount(ncp, varp, edges, ); if(ii == -1) { return( putNCv_double(ncp, varp, start, iocount, value) ); } assert(ii >= 0); { /* inline */ ALLOC_ONSTACK(coord, size_t, varp->ndims); ALLOC_ONSTACK(upper, size_t, varp->ndims); const size_t index = ii; /* copy in starting indices */ (void) memcpy(coord, start, varp->ndims * sizeof(size_t)); /* set up in maximum indices */ set_upper(upper, start, edges, [varp->ndims]); /* ripple counter */ while(*coord < *upper) { const int lstatus = putNCv_double(ncp, varp, coord, iocount, value); if(lstatus != NC_NOERR) { if(lstatus != NC_ERANGE) { status = lstatus; /* fatal for the loop */ break; } /* else NC_ERANGE, not fatal for the loop */ if(status == NC_NOERR) status = lstatus; } value += iocount; odo1(start, upper, coord, [index], [index]); } FREE_ONSTACK(upper); FREE_ONSTACK(coord); } /* end inline */ return status; } that calls:
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #19 from Thomas Koenig --- Thanks. A bit more: What are the declarations of the actual srgument, of the dummy argument (on the callee side), and what is the argument in the call list? Ill try to construct a test case tonight then.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #18 from Martin Liška --- $ cat -n netcdf/netcdf_expanded.f90: ... 1470 print *,shape(values) 1471 print *,size(values) 1472 print *,is_contiguous(values) 1473 1474 nf90_put_var_1D_EightByteReal = & 1475nf_put_vara_double(ncid, varid, localStart, localCount, values) 1476 end if 1477 end function nf90_put_var_1D_EightByteReal ... gets me: 1 1 T Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0 0x7f955f316b40 in ??? #1 0x7f955f315d75 in ??? #2 0x7f955efc3e0f in ??? at /usr/src/debug/glibc-2.29-5.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 #3 0x8e905c in __netcdf_MOD_nf90_put_var_1d_eightbytereal at /home/marxin/Programming/cpu2017/benchspec/CPU/527.cam4_r/build/build_peak_gcc7-m64./netcdf_expanded.f90:1475 So print result is: 1, 1, T.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #17 from Martin Liška --- (In reply to Thomas Koenig from comment #16) > Hi Martin, > > Is this for the slowdown or for the wrong-code issue? It's the wrong code for cam4_r benchmark. > > To get another view, from a gdb seesion of the compiler: > > call debug(expr) > call debug(fsym) (gdb) p debug(expr) nf90_put_var_1d_eightbytereal:values(FULL) (REAL 8) $3 = void (gdb) p debug(fsym) || symbol: '_formal_107' type spec : (REAL 8) attributes: (VARIABLE DIMENSION DUMMY) Array spec:(0 [0]) $4 = void > > a look at expr->symtree->n.sym (I think call debug(expr->symtree->n.sym) > will also work, (gdb) call debug(expr->symtree->n.sym) || symbol: 'values' type spec : (REAL 8) attributes: (VARIABLE DIMENSION DUMMY(IN)) Array spec:(1 [0] AS_ASSUMED_SHAPE 1 () ) > > a look at expr->ref (follow a few pointers) > (gdb) p *expr->ref $8 = { type = REF_ARRAY, u = { ar = { type = AR_FULL, dimen = 1, codimen = 0, in_allocate = false, team = 0x0, stat = 0x0, where = { nextc = 0x0, lb = 0x0 }, as = 0x27d7ee0, c_where = {{ nextc = 0x0, lb = 0x0 } }, start = {0x0 }, end = {0x0 }, stride = {0x0 }, dimen_type = {DIMEN_RANGE, 0 } }, c = { component = 0x10001, sym = 0x0 }, ss = { start = 0x10001, end = 0x0, length = 0x0 }, i = INQUIRY_IM }, next = 0x0 } > a look at fsym->as (also follow non-zero pointers). (gdb) p *fsym->as $9 = { rank = 0, corank = 0, type = AS_ASSUMED_SIZE, cotype = 0, lower = {0x0 }, upper = {0x0 }, cray_pointee = false, cp_was_assumed = false, resolved = false } > > Also, if you have > > call foo(...,a, ...) > > you can put > > print *,shape(a) > print *,size(a) > print *,is_contiguous(a) Let me work on this.. > > into the source, run it and see what you get. > > Also, look into the callee if there is a bounds violation - what > is the dummy argumet declared as on the calee's side? > > Maybe you could also put > > subroutine foo (, a, ...) > > print *,shape(a) > print *,size(a) > print *,is_contiguous(a) > > into the source code and paste the output. > > Regards > > Thomas
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #16 from Thomas Koenig --- Hi Martin, Is this for the slowdown or for the wrong-code issue? To get another view, from a gdb seesion of the compiler: call debug(expr) call debug(fsym) a look at expr->symtree->n.sym (I think call debug(expr->symtree->n.sym) will also work, a look at expr->ref (follow a few pointers) a look at fsym->as (also follow non-zero pointers). Also, if you have call foo(...,a, ...) you can put print *,shape(a) print *,size(a) print *,is_contiguous(a) into the source, run it and see what you get. Also, look into the callee if there is a bounds violation - what is the dummy argumet declared as on the calee's side? Maybe you could also put subroutine foo (, a, ...) print *,shape(a) print *,size(a) print *,is_contiguous(a) into the source code and paste the output. Regards Thomas
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #15 from Martin Liška --- Resulting difference in original dump file is: BEFORE: D.20757 = _gfortran_internal_pack (); __result_nf90_put_var_1d_eigh = nf_put_vara_double ((integer(kind=4) *) ncid, (integer(kind=4) *) varid, , , D.20757); if ((real(kind=8)[0:] *) parm.2491.data != (real(kind=8)[0:] *) D.20757) { _gfortran_internal_unpack (, D.20757); __builtin_free (D.20757); } AFTER: D.20757 = offset.2468; D.20758 = ubound.2466; D.20759 = D.20758 + -1; typedef real(kind=8) [0:]; atmp.2492.dtype = {.elem_len=8, .rank=1, .type=3}; atmp.2492.dim[0].stride = 1; atmp.2492.dim[0].lbound = 0; atmp.2492.dim[0].ubound = D.20759; D.20767 = D.20759 < 0; D.20768 = D.20759 + 1; atmp.2492.span = 8; D.20769 = (void * restrict) __builtin_malloc (D.20767 ? 1 : MAX_EXPR <(unsigned long) (D.20768 * 8), 1>); D.20770 = D.20769; atmp.2492.data = D.20770; atmp.2492.offset = 0; { integer(kind=8) S.2493; integer(kind=8) D.20772; D.20772 = stride.2467; S.2493 = 0; while (1) { if (S.2493 > D.20759) goto L.778; (*(real(kind=8)[0:] * restrict) atmp.2492.data)[S.2493] = (*values.0)[(S.2493 + 1) * D.20772 + D.20757]; S.2493 = S.2493 + 1; } L.778:; } __result_nf90_put_var_1d_eigh = nf_put_vara_double ((integer(kind=4) *) ncid, (integer(kind=4) *) varid, , , (real(kind=8)[0:] * restrict) atmp.2492.data); D.20774 = offset.2468; D.20775 = ubound.2466; { integer(kind=8) S.2494; integer(kind=8) D.20778; D.20778 = stride.2467; D.20776 = -1; S.2494 = 1; while (1) { if (S.2494 > D.20775) goto L.779; (*values.0)[S.2494 * D.20778 + D.20774] = (*(real(kind=8)[0:] * restrict) atmp.2492.data)[S.2494 + D.20776]; S.2494 = S.2494 + 1; } L.779:; } __builtin_free ((void *) atmp.2492.data); @Thomas: Can you please provide another hint what to do now?
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #14 from Martin Liška --- Ok, so I isolated that to a single file and one gfc_conv_subref_array_arg call. Problematic file is netcdf/netcdf.f90 and the gfc_conv_subref_array_arg call happens for: (gdb) p *expr $3 = { expr_type = EXPR_VARIABLE, ts = { type = BT_REAL, kind = 8, u = { derived = 0x0, cl = 0x0, pad = 0 }, interface = 0x0, is_c_interop = 0, is_iso_c = 0, f90_type = BT_UNKNOWN, deferred = false, interop_kind = 0x0 }, rank = 1, shape = 0x0, symtree = 0x27d3570, ref = 0x2b83f20, where = { nextc = 0x23bc358, lb = 0x23bc230 }, base_expr = 0x0, is_boz = 0, is_snan = 0, error = 0, user_operator = 0, mold = 0, must_finalize = 0, no_bounds_check = 0, external_blas = 0, do_not_resolve_again = 0, do_not_warn = 0, representation = { length = 0, string = 0x0 }, value = { logical = 0, iokind = M_READ, integer = {{ _mp_alloc = 0, _mp_size = 0, _mp_d = 0x0 }}, real = {{ _mpfr_prec = 0, _mpfr_sign = 0, _mpfr_exp = 0, _mpfr_d = 0x0 }}, complex = {{ re = {{ _mpfr_prec = 0, _mpfr_sign = 0, _mpfr_exp = 0, _mpfr_d = 0x0 }}, im = {{ _mpfr_prec = 0, _mpfr_sign = 0, _mpfr_exp = 0, _mpfr_d = 0x0 }} }}, op = { op = GFC_INTRINSIC_BEGIN, uop = 0x0, op1 = 0x0, op2 = 0x0 }, function = { actual = 0x0, name = 0x0, isym = 0x0, esym = 0x0 }, compcall = { actual = 0x0, name = 0x0, base_object = 0x0, tbp = 0x0, ignore_pass = 0, assign = 0 }, character = { length = 0, string = 0x0 }, constructor = 0x0 }, param_list = 0x0 } proc_name=0x15068d20 "nf_put_vara_double" (gdb) p *fsym $5 = { name = 0x144a2c20 "_formal_107", module = 0x0, declared_at = { nextc = 0x23c86d4, lb = 0x23c8590 }, ts = { type = BT_REAL, kind = 8, u = { derived = 0x0, cl = 0x0, pad = 0 }, interface = 0x0, is_c_interop = 0, is_iso_c = 0, f90_type = BT_UNKNOWN, deferred = false, interop_kind = 0x0 }, attr = { allocatable = 0, dimension = 1, codimension = 0, external = 0, intrinsic = 0, optional = 0, pointer = 0, target = 0, value = 0, volatile_ = 0, temporary = 0, dummy = 1, result = 0, assign = 0, threadprivate = 0, not_always_present = 0, implied_index = 0, subref_array_pointer = 0, proc_pointer = 0, asynchronous = 0, contiguous = 0, fe_temp = 0, automatic = 0, class_pointer = 0, save = SAVE_NONE, data = 0, is_protected = 0, use_assoc = 0, used_in_submodule = 0, use_only = 0, use_rename = 0, imported = 0, host_assoc = 0, in_namelist = 0, in_common = 0, in_equivalence = 0, function = 0, subroutine = 0, procedure = 0, generic = 0, generic_copy = 0, implicit_type = 0, untyped = 0, is_bind_c = 0, extension = 0, is_class = 0, class_ok = 0, vtab = 0, vtype = 0, is_c_interop = 0, is_iso_c = 0, sequence = 0, elemental = 0, pure = 0, recursive = 0, unmaskable = 0, masked = 0, contained = 0, mod_proc = 0, abstract = 0, module_procedure = 0, public_used = 0, implicit_pure = 0, array_outer_dependency = 0, noreturn = 0, entry = 0, entry_master = 0, mixed_entry_master = 0, always_explicit = 0, artificial = 0, referenced = 0, is_main_program = 0, access = ACCESS_UNKNOWN, intent = INTENT_UNKNOWN, flavor = FL_VARIABLE, if_source = IFSRC_UNKNOWN, proc = PROC_UNKNOWN, cray_pointer = 0, cray_pointee = 0, alloc_comp = 0, pointer_comp = 0, proc_pointer_comp = 0, private_comp = 0, zero_comp = 0, coarray_comp = 0, lock_comp = 0, event_comp = 0, defined_assign_comp = 0, unlimited_polymorphic = 0, has_dtio_procs = 0, caf_token = 0, select_type_temporary = 0, associate_var = 0, pdt_kind = 0, pdt_len = 0, pdt_type = 0, pdt_template = 0, pdt_array = 0, pdt_string = 0, omp_udr_artificial_var = 0, omp_declare_target = 0, omp_declare_target_link = 0, oacc_declare_create = 0, oacc_declare_copyin = 0, oacc_declare_deviceptr = 0, oacc_declare_device_resident = 0, oacc_declare_link = 0, oacc_routine_lop = OACC_ROUTINE_LOP_NONE, ext_attr = 0, volatile_ns =
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #13 from Thomas Koenig --- I'm afraid the tree dumps will not help a lot - I know what they look like before and after, but I don't know what is wrong with it. I would therefore ask you to reduce the test case, maybe starting with the wrong-code issue. I'm describing now what I would do, if I had access to SPEC. One possibility is using -Os. This restores the behavior of using the library function for packing / unpacking. You can check which file(s) you need to compile using that flag to make that problem go away. (A more fancy way would be to introduce, in my local tree, a new option to specifically disable that optimization.) The relevant part is in trans-array.c: /* When optmizing, we can use gfc_conv_subref_array_arg for 8138 making the packing and unpacking operation visible to the 8139 optimizers. */ 8140 8141 if (g77 && optimize && !optimize_size && expr->expr_type == EXPR_VARIABLE 8142 && !is_pointer (expr) && (fsym == NULL 8143 || fsym->ts.type != BT_ASSUMED)) 8144 { 8145 gfc_conv_subref_array_arg (se, expr, g77, 8146 fsym ? fsym->attr.intent : INTENT_INOUT, 8147 false, fsym, proc_name, sym); 8148 return; 8149 } 8150 ) Once the file is known, I would set a breakpoint at the call to gfc_conv_subref_arg and look at expr, fsym and proc_name to pinpoint which part of the source code is affected. Once that is known, I would debug the compiled program, seeing what conditions are when the program is called - what kind of array is passed, what is its rank, what are the dimension, are they contiguous, and what does the dummy argument on the callee's side look like, and work on reducing the test case from there. Another point - maybe it would be a good idea to see how at least one of the regular Fortran people could get access to SPEC. I would be willing to sign an NDA, but I would _not_ be willing to pay for it. I suppose it would be no good to ask the FSF, they would probably go bananas :-)
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #11 from Martin Liška --- Created attachment 46394 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46394=edit 521.wrf_r valgrind report
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #12 from Martin Liška --- Created attachment 46395 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46395=edit 527.cam4_r valgrind report
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #10 from Martin Liška --- (In reply to Thomas Koenig from comment #8) > (In reply to Martin Liška from comment #6) > > So there's somebody who is having the file in a public git repository. > > That's probably violating SPEC rules :) But anyway, the .f90 file is here: > > https://gitlab.bcamath.org/atrucchia/randomfront-wrfsfire-lsfire/blob/ > > 152f8c92b89b20021403acba9536553fda7a527b/wrfv2_fire/share/solve_interface.f90 > > > > @Thomas: Is it enough info? > > I'm afraid not, it is neither complete nor self-contained (nor is the > bug report in comment#7). So, this is not a valid bug report according > to https://www.gnu.org/software/gcc/bugs/ . It's a heads-up, nothing > more. I'm sorry that we do play the SPEC game with a not open sources software. But still, I'm willing to provide as many info as you need. There are all tree dumps before and after your revision: https://drive.google.com/file/d/1rzT3B0n6iMDIFNv0Y8dbKG2G8zhA_1fA/view?usp=sharing https://drive.google.com/file/d/1obnhaGDhXg6DmF5iEmchb7d7lNlf9fj-/view?usp=sharing > > SPEC is proprietary software, none of the Fortran maintainers has > access to it. I will deal with this bug the same as I deal with > all other bug reports - no test case, no possibility of fixing. That's unfortunate, yes.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #9 from nsz at gcc dot gnu.org --- spec2017 521.wrf_r never finishes on aarch64 gcc rev 271291 runs fine gcc rev 271380 does not finish (possibly a crash that the spec scripts don't detect)
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Thomas Koenig changed: What|Removed |Added Status|NEW |WAITING --- Comment #8 from Thomas Koenig --- (In reply to Martin Liška from comment #6) > So there's somebody who is having the file in a public git repository. > That's probably violating SPEC rules :) But anyway, the .f90 file is here: > https://gitlab.bcamath.org/atrucchia/randomfront-wrfsfire-lsfire/blob/ > 152f8c92b89b20021403acba9536553fda7a527b/wrfv2_fire/share/solve_interface.f90 > > @Thomas: Is it enough info? I'm afraid not, it is neither complete nor self-contained (nor is the bug report in comment#7). So, this is not a valid bug report according to https://www.gnu.org/software/gcc/bugs/ . It's a heads-up, nothing more. SPEC is proprietary software, none of the Fortran maintainers has access to it. I will deal with this bug the same as I deal with all other bug reports - no test case, no possibility of fixing.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #7 from Martin Liška --- Note that patch is also responsible for 521.wrf_r segfault with -Ofast -march=native on a Zen machine (with ulimit -s == unlimited): Contents of wrf.err Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0 0x14c7dd128e0f in ??? at /usr/src/debug/glibc-2.29-5.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0 #1 0x1605961 in __module_ra_rrtm_MOD_rtrn at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_ra_rrtm.fppized.f90:6413 #2 0x161dddf in __module_ra_rrtm_MOD_rrtm at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_ra_rrtm.fppized.f90:2256 #3 0x161edd2 in __module_ra_rrtm_MOD_rrtmlwrad at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_ra_rrtm.fppized.f90:1994 #4 0x1631f8d in __module_radiation_driver_MOD_radiation_driver at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_radiation_driver.fppized.f90:1106 #5 0x11776e3 in __module_first_rk_step_part1_MOD_first_rk_step_part1 at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_first_rk_step_part1.fppized.f90:367 #6 0x18b6c15 in solve_em_ at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/solve_em.fppized.f90:837 #7 0x1906eb3 in solve_interface_ at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/solve_interface.fppized.f90:135 #8 0x127a53b in __module_integrate_MOD_integrate at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_integrate.fppized.f90:306 #9 0x17e5171 in __module_wrf_top_MOD_wrf_run at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/module_wrf_top.fppized.f90:309 #10 0x404b02 in wrf at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/wrf.fppized.f90:28 #11 0x404b02 in main at /home/marxin/Programming/cpu2017/benchspec/CPU/521.wrf_r/build/build_peak_gcc7-m64.0001/wrf.fppized.f90:6
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Martin Liška changed: What|Removed |Added Status|WAITING |NEW --- Comment #6 from Martin Liška --- So there's somebody who is having the file in a public git repository. That's probably violating SPEC rules :) But anyway, the .f90 file is here: https://gitlab.bcamath.org/atrucchia/randomfront-wrfsfire-lsfire/blob/152f8c92b89b20021403acba9536553fda7a527b/wrfv2_fire/share/solve_interface.f90 @Thomas: Is it enough info?
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #5 from Martin Liška --- Ok, looking at perf report: $ head -n20 before.report.txt # Overhead Command Shared ObjectSymbol # ... ... ... # 7.45% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_advect_em_MOD_advect_scalar 5.54% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_advance_w 5.48% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_advance_uv 5.45% wrf_peak.amd64- libc-2.29.so [.] __memset_avx2_unaligned_erms 4.51% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_big_step_utilities_em_MOD_calc_cq 3.84% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_bl_ysu_MOD_ysu2d 3.80% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_calc_p_rho 3.71% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_advance_mu_t 3.55% wrf_peak.amd64- libmvec-2.29.so [.] _ZGVdN8vv_powf_avx2 3.45% wrf_peak.amd64- libc-2.29.so [.] __memmove_avx_unaligned_erms 2.82% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_sumflux 2.69% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_em_MOD_rk_update_scalar 2.45% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_mp_lin_MOD_clphy1d 2.24% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_mp_lin_MOD_lin_et_al 2.16% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_small_step_prep 2.09% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_big_step_utilities_em_MOD_curvature 2.07% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_em_MOD_rk_addtend_dry $ head -n20 after.report.txt # Overhead Command Shared ObjectSymbol # ... ... ... # 19.91% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] solve_interface_ 5.99% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_advect_em_MOD_advect_scalar 4.59% wrf_peak.amd64- libc-2.29.so [.] __memset_avx2_unaligned_erms 4.45% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_advance_w 4.30% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_advance_uv 3.63% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_big_step_utilities_em_MOD_calc_cq 3.11% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_calc_p_rho 3.10% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_bl_ysu_MOD_ysu2d 3.02% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_advance_mu_t 2.77% wrf_peak.amd64- libc-2.29.so [.] __memmove_avx_unaligned_erms 2.75% wrf_peak.amd64- libmvec-2.29.so [.] _ZGVdN8vv_powf_avx2 2.31% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_sumflux 2.10% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_em_MOD_rk_update_scalar 1.87% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_mp_lin_MOD_clphy1d 1.78% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_mp_lin_MOD_lin_et_al 1.69% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_small_step_em_MOD_small_step_prep 1.68% wrf_peak.amd64- wrf_peak.amd64-m64-mine [.] __module_big_step_utilities_em_MOD_curvature The difference is in solve_interface_, I'll analyze that..
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Thomas Koenig changed: What|Removed |Added Status|NEW |WAITING --- Comment #4 from Thomas Koenig --- Waiting for a test case.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #3 from Thomas Koenig --- I think I have an idea what might be the problem. Does the code do something like call foo(a) ... subroutine foo(a) real, dimension(:) :: a call bar(a,size(n)) ... subroutine bar(a,n) real, dimension(n) :: a ? What might be missing for good performance is the check for contiguous memory when calling bar.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #2 from Thomas Koenig --- I am a bit surprised at this, that the library version of packing seems to be faster than the inlined one. Or maybe some argument is now packed which should not be. Increased code size is sort of expected, copying inline is bigger than calling s library function. This is why this is not done at -Os. Is it possible to get a reduced test case that shows the slowdown?
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 --- Comment #1 from Richard Biener --- Haswell as well (https://gcc.opensuse.org/gcc-old/SPEC/CFP/sb-czerny-head-64-2006/recent.html) but only 10% and not bisected.
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-05-20 CC||tkoenig at gcc dot gnu.org Known to work||9.1.0 Version|unknown |10.0 Blocks||26163 Target Milestone|--- |10.0 Ever confirmed|0 |1 Known to fail||10.0 Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)