Sure - once any change works locally [for gcc and xlf]
When I try - I get a bunch of errors.. [yet to digest them.]
Satish
On Wed, 3 Mar 2021, Jacob Faibussowitsch wrote:
> > I'm not sure what would happen if these 'use' statements are removed [whats
> > required and what can be removed?]
> >
> > The relevant code that adds this is in
> > lib/petsc/bin/maint/generatefortranstubs.py
> >
> > fd.write(' use petsc'+mansec+'def\n')
>
> I suppose we can run it through CI, see if it breaks?
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
>
> > On Mar 3, 2021, at 12:49, Satish Balay <[email protected]> wrote:
> >
> > On Wed, 3 Mar 2021, Jacob Faibussowitsch wrote:
> >
> >> Hello All,
> >>
> >> I discovered a compiler bug in the IBM xl fortran compiler a few weeks ago
> >> that would crash the compiler when compiling petsc fortran interfaces. The
> >> TL;DR of it is that the xl compiler creates a function dictionary for
> >> every function imported in fortran modules, and since petsc fortran
> >> interfaces seem to import entire packages writ-large this exceeds the
> >> number of dictionary entries (2**21):
> >>
> >>> The reason for the Internal Compiler Error is because we can't grow an
> >>> interal dictionary anymore (ie we hit a 2**21 limit).
> >>> The file contains many module procedures and interfaces that use the same
> >>> helper module. As a result, we are importing the dictionary entries for
> >>> that module repeatedly reaching
> >>> the limit.
> >>>
> >>> Can you please give the following source code workaround a try?
> >>> Since there is already "use petscvecdefdummy" at the module scope, one
> >>> workaround might be to remove the unnecessary "use petscvecdefdummy" in
> >>> vecnotequal and vecequals
> >>> and all similar procedures.
> >>>
> >>> For example, the test case has:
> >>> module petscvecdef
> >>> use petscvecdefdummy
> >>> ...
> >>> function vecnotequal(A,B)
> >>> use petscvecdefdummy
> >>> logical vecnotequal
> >>> type(tVec), intent(in) :: A,B
> >>> vecnotequal = (A%v .ne. B%v)
> >>> end function
> >>> function vecequals(A,B)
> >>> use petscvecdefdummy
> >>> logical vecequals
> >>> type(tVec), intent(in) :: A,B
> >>> vecequals = (A%v .eq. B%v)
> >>> end function
> >>> ...
> >>> end module
> >>> Another workaround would be to put the procedure definitions from this
> >>> large module into several submodules. Each submodule would be able to
> >>> accommodate a dictionary with 2**21 entries.
> >>>
> >>>
> >>> Please let us know if one of the above workarounds resolve the issue.
> >>
> >>
> >> The proposed fix from IBM would be to pull “use moduleXXX” out of
> >> subroutines or to have our auto-fortran interfaces detect which symbols to
> >> include from the respective modules and only include those in the
> >> subroutines. I’m not familiar at all with how the interfaces are generated
> >> so I don’t even know if this is possible.
> >
> > I'm not sure what would happen if these 'use' statements are removed [whats
> > required and what can be removed?]
> >
> > The relevant code that adds this is in
> > lib/petsc/bin/maint/generatefortranstubs.py
> >
> > fd.write(' use petsc'+mansec+'def\n')
> >
> > Satish
> >
> >>> IBM provided the following additional explanation and example. Can the
> >>> process used to generate these routines and functions determine the
> >>> specific symbols required and then use the only keyword or import
> >>> statement to include them?
> >>>
> >>> When factoring out use statements out of module procedures, you can just
> >>> delete them. But you can't completely remove them from interface blocks.
> >>> Instead, you can limit them either by using use <module>, only: <symbol>
> >>> or import <symbol> . if the hundreds of use statements in the program are
> >>> factored out / limited in this way, that should reduce the dictionary
> >>> size sufficiently for the program to compile.
> >>>
> >>> For example
> >>> Interface
> >>> Subroutine VecRestoreArrayReadF90(v,array,ierr)
> >>> use petscvecdef
> >>> real(kind=selected_real_kind(10)), pointer :: array(:)
> >>> integer(kind=selected_int_kind(5)) ierr
> >>> type(tVec) v
> >>> End Subroutine
> >>> End Interface
> >>>
> >>> imports all symbols from petscvecdef into the dictionary even though we
> >>> only need tVec . So we can either:
> >>>
> >>> Interface
> >>> Subroutine VecRestoreArrayReadF90(v,array,ierr)
> >>> use petscvecdef, only: tVec
> >>> implicit none
> >>> real(kind=selected_real_kind(10)), pointer :: array(:)
> >>> integer(kind=selected_int_kind(5)) ierr
> >>> type(tVec) v
> >>> End Subroutine
> >>> End Interface
> >>>
> >>> or if use petscvecdef is used in the outer scope, we can:
> >>> Interface
> >>> Subroutine VecRestoreArrayReadF90(v,array,ierr)
> >>> import tVec
> >>> implicit none
> >>> real(kind=selected_real_kind(10)), pointer :: array(:)
> >>> integer(kind=selected_int_kind(5)) ierr
> >>> type(tVec) v
> >>> End Subroutine
> >>> End Interface
> >>> (The two methods (use, only vs import) are equivalent in terms of impact
> >>> to the dictionary.)
> >>>
> >>
> >> Is this compiler ~feature~ something that we intend to work around?
> >> Thoughts?
> >>
> >> Best regards,
> >>
> >> Jacob Faibussowitsch
> >> (Jacob Fai - booss - oh - vitch)
> >> Cell: (312) 694-3391
> >>
> >>> Begin forwarded message:
> >>>
> >>> From: "Roy Musselman" <[email protected]>
> >>> Subject: Re: Case TS005062693 - XLF: ICE in xlfentry compiling a module
> >>> with 358 subroutines
> >>> Date: March 3, 2021 at 08:23:17 CST
> >>> To: Jacob Faibussowitsch <[email protected]>
> >>> Cc: "Gyllenhaal, John C." <[email protected]>
> >>>
> >>> Hi Jacob,
> >>> I tried the first suggestion and commented out the use statements called
> >>> within the functions. However, I hit the following error complaining
> >>> about specific symbol dependencies provided by the library.
> >>>
> >>> .../src/vec/f90-mod/petscvecmod.F90", line 107.37: 1514-084 (S)
> >>> Identifier a is being declared with type name tvec which has not been
> >>> defined in a derived type definition.
> >>>
> >>> IBM provided the following additional explanation and example. Can the
> >>> process used to generate these routines and functions determine the
> >>> specific symbols required and then use the only keyword or import
> >>> statement to include them?
> >>>
> >>> When factoring out use statements out of module procedures, you can just
> >>> delete them. But you can't completely remove them from interface blocks.
> >>> Instead, you can limit them either by using use <module>, only: <symbol>
> >>> or import <symbol> . if the hundreds of use statements in the program are
> >>> factored out / limited in this way, that should reduce the dictionary
> >>> size sufficiently for the program to compile.
> >>>
> >>> For example
> >>> Interface
> >>> Subroutine VecRestoreArrayReadF90(v,array,ierr)
> >>> use petscvecdef
> >>> real(kind=selected_real_kind(10)), pointer :: array(:)
> >>> integer(kind=selected_int_kind(5)) ierr
> >>> type(tVec) v
> >>> End Subroutine
> >>> End Interface
> >>>
> >>> imports all symbols from petscvecdef into the dictionary even though we
> >>> only need tVec . So we can either:
> >>>
> >>> Interface
> >>> Subroutine VecRestoreArrayReadF90(v,array,ierr)
> >>> use petscvecdef, only: tVec
> >>> implicit none
> >>> real(kind=selected_real_kind(10)), pointer :: array(:)
> >>> integer(kind=selected_int_kind(5)) ierr
> >>> type(tVec) v
> >>> End Subroutine
> >>> End Interface
> >>>
> >>> or if use petscvecdef is used in the outer scope, we can:
> >>> Interface
> >>> Subroutine VecRestoreArrayReadF90(v,array,ierr)
> >>> import tVec
> >>> implicit none
> >>> real(kind=selected_real_kind(10)), pointer :: array(:)
> >>> integer(kind=selected_int_kind(5)) ierr
> >>> type(tVec) v
> >>> End Subroutine
> >>> End Interface
> >>> (The two methods (use, only vs import) are equivalent in terms of impact
> >>> to the dictionary.)
> >>>
> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> Roy Musselman
> >>> IBM HPC Application Analyst at Lawrence Livermore National Lab
> >>> email: [email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>
> >>> LLNL office: 925-422-6033
> >>> Cell: 507-358-8895, Home: 507-281-9565
> >>>
> >>> Roy Musselman---02/24/2021 07:08:45 PM---Hi Jacob, I opened the ticket
> >>> with IBM: case TS005062693 and and the local LLNL Sierra Jira Ticket
> >>>
> >>> From: Roy Musselman/Rochester/Contr/IBM
> >>> To: Jacob Faibussowitsch <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> Cc: "Gyllenhaal, John C." <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> Date: 02/24/2021 07:08 PM
> >>> Subject: Re: [EXTERNAL] Case TS005062693 - XLF: ICE in xlfentry
> >>> compiling a module with 358 subroutines
> >>>
> >>>
> >>>
> >>> Hi Jacob,
> >>> I opened the ticket with IBM: case TS005062693 and and the local LLNL
> >>> Sierra Jira Ticket at
> >>> https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues
> >>>
> >>> <https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues><https://urldefense.com/v3/__https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues__;!!DZ3fjg!vDUpTg4q6jg1lQwt37jm9Uzc7MqGrEdrg0wpKgGq9P5JoR3jKrqncOAKyni2BEUYOxQ$
> >>>
> >>> <https://urldefense.com/v3/__https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues__;!!DZ3fjg!vDUpTg4q6jg1lQwt37jm9Uzc7MqGrEdrg0wpKgGq9P5JoR3jKrqncOAKyni2BEUYOxQ$>>
> >>>
> >>> Today IBM provided the response below. I don't know when I'll have time
> >>> to try it on the reproducer I gave IBM. Perhaps early next week. Can you
> >>> review this and see if it helps?
> >>>
> >>> The reason for the Internal Compiler Error is because we can't grow an
> >>> interal dictionary anymore (ie we hit a 2**21 limit).
> >>> The file contains many module procedures and interfaces that use the same
> >>> helper module. As a result, we are importing the dictionary entries for
> >>> that module repeatedly reaching
> >>> the limit.
> >>>
> >>> Can you please give the following source code workaround a try?
> >>> Since there is already "use petscvecdefdummy" at the module scope, one
> >>> workaround might be to remove the unnecessary "use petscvecdefdummy" in
> >>> vecnotequal and vecequals
> >>> and all similar procedures.
> >>>
> >>> For example, the test case has:
> >>> module petscvecdef
> >>> use petscvecdefdummy
> >>> ...
> >>> function vecnotequal(A,B)
> >>> use petscvecdefdummy
> >>> logical vecnotequal
> >>> type(tVec), intent(in) :: A,B
> >>> vecnotequal = (A%v .ne. B%v)
> >>> end function
> >>> function vecequals(A,B)
> >>> use petscvecdefdummy
> >>> logical vecequals
> >>> type(tVec), intent(in) :: A,B
> >>> vecequals = (A%v .eq. B%v)
> >>> end function
> >>> ...
> >>> end module
> >>> Another workaround would be to put the procedure definitions from this
> >>> large module into several submodules. Each submodule would be able to
> >>> accommodate a dictionary with 2**21 entries.
> >>>
> >>>
> >>> Please let us know if one of the above workarounds resolve the issue.
> >>>
> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> Roy Musselman
> >>> IBM HPC Application Analyst at Lawrence Livermore National Lab
> >>> email: [email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>
> >>> LLNL office: 925-422-6033
> >>> Cell: 507-358-8895, Home: 507-281-9565
> >>>
> >>>
> >>> Roy Musselman---02/21/2021 09:42:55 PM---Hi Jacob, After some more
> >>> experimentation, I think I may have found what is triggering the ICE. It
> >>>
> >>> From: Roy Musselman/Rochester/Contr/IBM
> >>> To: Jacob Faibussowitsch <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> Cc: "Gyllenhaal, John C." <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> Date: 02/21/2021 09:42 PM
> >>> Subject: Re: [EXTERNAL] Re: xlf90_r Internal Compiler Error
> >>>
> >>>
> >>> Hi Jacob,
> >>>
> >>> After some more experimentation, I think I may have found what is
> >>> triggering the ICE. It doesn't appear to be related to the subroutine
> >>> name length. I think the compiler may be hitting an internal limit of the
> >>> number of subroutines within a module. There are 358 subroutines
> >>> contained in the expanded petscmatmod.F90. Removing 4 subroutines will
> >>> allow the compile to complete successfully, so the limit must be 354
> >>> subroutines. Is it possible for you to bust up petscmatmod into multiple
> >>> modules? I'll package up the reproducer and pass it on to the compiler
> >>> development team.
> >>>
> >>> I've asked for user feedback a couple years ago, when the IBM Power9
> >>> CORAL-1 Sierra systems were deployed, but received minimal responses. DOE
> >>> is now working with Cray (aka HPE) developing the environment for the
> >>> CORAL-2 system (El Capitan). I'll pass your request to the LLNL person I
> >>> know that is dealing with math libraries for CORAL-2.
> >>>
> >>> We use the spack tool to download and build petsc and its specified
> >>> dependencies. I switched between the PETSC versions by changing the
> >>> PETSCDIR variable in the script I shared with you. I've attached a tar
> >>> ball containing the scripts used to build PETSc via spack.
> >>>
> >>> [attachment "bld-petsc-spack.tgz" deleted by Roy
> >>> Musselman/Rochester/Contr/IBM]
> >>>
> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> Roy Musselman
> >>> IBM HPC Application Analyst at Lawrence Livermore National Lab
> >>> email: [email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>
> >>> LLNL office: 925-422-6033
> >>> Cell: 507-358-8895, Home: 507-281-9565
> >>>
> >>>
> >>> Jacob Faibussowitsch ---02/21/2021 12:24:11 PM---Hi Roy, > I'm not sure
> >>> which projects at LLNL are using PETSc or if they chose to build their
> >>> own ve
> >>>
> >>> From: Jacob Faibussowitsch <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> To: Roy Musselman <[email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>>
> >>> Cc: "Gyllenhaal, John C." <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> Date: 02/21/2021 12:24 PM
> >>> Subject: [EXTERNAL] Re: xlf90_r Internal Compiler Error
> >>>
> >>>
> >>>
> >>> Hi Roy, I'm not sure which projects at LLNL are using PETSc or if they
> >>> chose to build their own version. Entirely unrelated to our problem, but
> >>> is it possible to find this out? It would be great if yes, but also
> >>> completely fine if not. PETSc
> >>> Hi Roy,
> >>> I'm not sure which projects at LLNL are using PETSc or if they chose to
> >>> build their own version.
> >>> Entirely unrelated to our problem, but is it possible to find this out?
> >>> It would be great if yes, but also completely fine if not. PETSc is
> >>> potentially undergoing a rather transformative rewrite over the next few
> >>> years and we’d like to gather current usage data to get a better idea of
> >>> where PETSc fits into our users workflows. But we aren’t sure how to
> >>> gather this data (we don’t particularly want to scrape and silently send
> >>> it off without users consent/knowledge) absent user questionnaires and
> >>> HPC usage statistics.
> >>> If you are interested, I can share with you the spack recipes I use to
> >>> build petsc with hdf5, hypre, and suplerlu-dist.
> >>> Yes that would be quite useful. I can let it percolate through our dev
> >>> channels for any other recommendations etc.
> >>> 3.14.0 and 3.14.1
> >>>
> >>> "../roymuss/spack-stage-petsc-3.14.0-on3lboy4slkz65tsjttgfmwghzky54jj/spack-src/src/vec/f90-mod/petscvecmod.F90",
> >>> line 9.13: 1514-219 (S) Unable to access module symbol file for module
> >>> petscisdefdummy. Check path and file permissions of file. Use association
> >>> not done for this module.
> >>> 1501-511 Compilation failed for file petscvecmod.F90.
> >>> How exactly did you switch between versions? PETSc has 2 types of fortran
> >>> bindings, “ftn-custom” and “ftn-auto” (technically 3 including the F90
> >>> files, but those simply call either of the two preceding ones), a copy of
> >>> which you will find in every src directory. As the names imply ftn-auto
> >>> is auto generated while ftn-custom is hand-written.
> >>>
> >>> This also means that the ftn-auto files are __not__ tracked by git, so a
> >>> simple git checkout [new-tag] may not properly dispose of the old
> >>> auto-generated files (very rare, but IIRC we made a major enough change
> >>> to the fortran bindings within the last year to warrant having to "make
> >>> deletefortranstubs" before rebuilding).
> >>> Adding the option -qlanglvl=2003std or -qlanglvl=2008std produces a bunch
> >>> of other warning messages, but it still encounters the ICE. So, I'm
> >>> uncertain if the subroutine name length is the root of the problem.
> >>> Our current compiler flag selection philosophy is to require a minimum
> >>> but choose the maximum available reasonable flag for the compiler (I.e.
> >>> we require C99, but very often you will find that your code is compiled
> >>> with C11 or C17 if they are available). It is therefore odd that
> >>> configure did not use the same methodology for fortran compilers. I will
> >>> relay this on our side.
> >>> Is it possible for you to use subroutines that are less than 32
> >>> characters and see if that works four you? Have you used other fortran 90
> >>> compilers and do any of them complain of this?
> >>> Of all of the small quirks fortran has this is probably the most esoteric
> >>> one I’ve come across… I’ve attached a list of all the F90 compilers, and
> >>> their flags which we use in CI/CD (all of which is run multiple times
> >>> daily and __must__ pass). I got them all via grep, so there may be some
> >>> duplicates here or there. As for using shorter names, this is also
> >>> something we can look at, but since none of the other compilers have had
> >>> issues with this I’m not sure this is the change to make.
> >>> Are there any unusual or questionable language constructs used in any of
> >>> the functions mentioned above that may possibly challenge the compiler?
> >>> Not that I am aware of, but again I will ask around our dev channels and
> >>> see if anything comes to mind.
> >>>
> >>>
> >>> Best regards,
> >>>
> >>> Jacob Faibussowitsch
> >>> (Jacob Fai - booss - oh - vitch)
> >>> Cell: (312) 694-3391[attachment "compilerList" deleted by Roy
> >>> Musselman/Rochester/Contr/IBM]
> >>> On Feb 20, 2021, at 22:05, Roy Musselman <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>> wrote:
> >>> Hi Jacob,
> >>> Thanks for letting me know that you are a PETSc developer and that you
> >>> are testing it on the LLNL lassen system. I've used the spack build tool
> >>> to build and deploy a few versions on the systems. I'm not sure which
> >>> projects at LLNL are using PETSc or if they chose to build their own
> >>> version. I did however provide a single precision version upon request
> >>> that was integrated with MVAPICH2-MPI instead of the IBM-provided
> >>> Spectrum-MPI. Here's what's available on the systems today.
> >>>
> >>>> ml avail petsc
> >>> -----------------------------------------------------
> >>> /usr/tcetmp/modulefiles/Core
> >>> -----------------------------------------------------
> >>> petsc/default petsc/3.10.2 petsc/3.11.3 petsc/3.13.0 (D)
> >>> petsc/3.13.1-mvapich2-2020.01.09-xl-2020.03.18.single
> >>>
> >>> If you are interested, I can share with you the spack recipes I use to
> >>> build petsc with hdf5, hypre, and suplerlu-dist.
> >>>
> >>> After several attempts I was able to reproduce the Internal Compiler
> >>> Errro (ICE) that you are seeing using version 3.14.4. I've whittled it
> >>> down to the petscmatmod.F90 file and it's specific dependencies.
> >>> The following script is what I'm using. Note that in the 2nd set of
> >>> compiles, the -E option is used to expand all included source files and
> >>> headers and encapsulating it into a single large source file. This can be
> >>> used to help isolate the source of the problem.
> >>>
> >>> #!/bin/bash
> >>>
> >>> PETSCDIR="../roymuss/spack-stage-petsc-3.14.4-eh5arny7l3cqjlltlfpjp6f4jofbnmz6/spack-src"
> >>>
> >>> OPTIONS=" -qmoddir=moddir -I$PETSCDIR/arch-linux-c-opt/include
> >>> -I$PETSCDIR/include"
> >>> mkdir -p moddir
> >>>
> >>> set -x
> >>>
> >>> # Compile original source files including dependencies
> >>> if [ 0 = 1 ]; then
> >>> mpif90 -c -g $OPTIONS $PETSCDIR/src/sys/f90-mod/petscsysmod.F90 -o
> >>> petscsysmod.o
> >>> mpif90 -c -g $OPTIONS $PETSCDIR/src/vec/f90-mod/petscvecmod.F90 -o
> >>> petscvecmod.o
> >>> mpif90 -c -g $OPTIONS $PETSCDIR/src/mat/f90-mod/petscmatmod.F90 -o
> >>> petscmatmod.o
> >>> fi
> >>>
> >>> # Use -E option to expand source into full source files
> >>> if [ 0 = 1 ]; then
> >>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/sys/f90-mod/petscsysmod.F90 -o
> >>> full_petscsysmod.F90
> >>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/vec/f90-mod/petscvecmod.F90 -o
> >>> full_petscvecmod.F90
> >>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/mat/f90-mod/petscmatmod.F90 -o
> >>> full_petscmatmod.F90
> >>> fi
> >>>
> >>> # Compile from full source files
> >>> if [ 1 = 1 ]; then
> >>> mpif90 -c -g -Imoddir -qmoddir=moddir full_petscsysmod.F90 -o
> >>> full_petscsysmod.o
> >>> mpif90 -c -g -Imoddir -qmoddir=moddir full_petscvecmod.F90 -o
> >>> full_petscvecmod.o
> >>> mpif90 -V -c -g -Imoddir -qmoddir=moddir full_petscmatmod.F90 -o
> >>> full_petscmatmod.o
> >>> fi
> >>>
> >>> <eof>
> >>>
> >>> Petsc 3.13.6 it the most recent version that did not fail. I tried all
> >>> subsequent versions and got the folowing results:
> >>>
> >>> 3.14.0 and 3.14.1
> >>>
> >>> "../roymuss/spack-stage-petsc-3.14.0-on3lboy4slkz65tsjttgfmwghzky54jj/spack-src/src/vec/f90-mod/petscvecmod.F90",
> >>> line 9.13: 1514-219 (S) Unable to access module symbol file for module
> >>> petscisdefdummy. Check path and file permissions of file. Use association
> >>> not done for this module.
> >>> 1501-511 Compilation failed for file petscvecmod.F90.
> >>>
> >>> 3.14.2, 3.14.3, and 3.14.4
> >>>
> >>> . . .
> >>> ** matnullspaceequals === End of Compilation 8 ===
> >>> *** Error in
> >>> `/usr/tce/packages/xl/xl-2020.11.12/xlf/16.1.1/exe/xlfentry': free():
> >>> invalid pointer: 0x0000200001740018 ***
> >>>
> >>> Examining the tail end of petscmatmod.F90
> >>>
> >>>
> >>> 80 function matnullspaceequals(A,B)
> >>> 81 use petscmatdefdummy
> >>> 82 logical matnullspaceequals
> >>> 83 type(tMatNullSpace), intent(in) :: A,B
> >>> 84 matnullspaceequals = (A%v .eq. B%v)
> >>> 85 end function
> >>> 86
> >>> 87 #if defined(_WIN32) && defined(PETSC_USE_SHARED_LIBRARIES)
> >>> 88 !DEC$ ATTRIBUTES DLLEXPORT::matnotequal
> >>> 89 !DEC$ ATTRIBUTES DLLEXPORT::matequals
> >>> 90 !DEC$ ATTRIBUTES DLLEXPORT::matfdcoloringnotequal
> >>> 91 !DEC$ ATTRIBUTES DLLEXPORT::matfdcoloringequals
> >>> 92 !DEC$ ATTRIBUTES DLLEXPORT::matnullspacenotequal
> >>> 93 !DEC$ ATTRIBUTES DLLEXPORT::matnullspaceequals
> >>> 94 #endif
> >>> 95 module petscmat
> >>> 96 use petscmatdef
> >>> 97 use petscvec
> >>> 98 #include <../src/mat/f90-mod/petscmat.h90>
> >>> 99 interface
> >>> 100 #include <../src/mat/f90-mod/ftn-auto-interfaces/petscmat.h90>
> >>> 101 end interface
> >>> 102 end module
> >>> 103
> >>>
> >>> Compiling the matnullspaceequals function was successful just before
> >>> hitting the error. The error goes away when removing either or both of
> >>> the #include lines 98 and 100. Both #include statements are required to
> >>> produce the error. The 3.13.6 and 3.14.4 version of the file identified
> >>> in the first #include at line 98 are identical. The file identified in
> >>> line 100 is different between 3.13.6 and 3.14.4.
> >>> Just looking at the list of subroutines contained within each version,
> >>> the following are the differences.
> >>>
> >>> Old subroutines available in 3.13.6 but removed from 4.14.4
> >>> subroutine MatFreeIntermediateDataStructures(a,z)
> >>>
> >>> New subroutines available in 4.14.4 but not contained in 3.13.6
> >>> subroutine MatDenseReplaceArray(a,b,z)
> >>> subroutine MatIsShell(a,b,z)
> >>> subroutine MatRARtMultEqual(a,b,c,d,e,z)
> >>> subroutine MatScaLAPACKGetBlockSizes(a,b,c,z)
> >>> subroutine MatScaLAPACKSetBlockSizes(a,b,c,z)
> >>> subroutine MatSeqAIJCUSPARSESetGenerateTranspose(a,b,z)
> >>> subroutine MatSeqAIJSetTotalPreallocation(a,b,z)
> >>> subroutine MatSetLayouts(a,b,c,z)
> >>>
> >>> Methodically removing the new subroutines did not provide a consistent
> >>> result. But I did notice the extra long subroutine name
> >>> MatSeqAIJCUSPARSESetGenerateTranspose had 37 characters.
> >>> A little research found: In Fortran 90/95 the maximum length was 31
> >>> characters, in Fortran 2003 it is now 63 characters. I found the
> >>> following subroutines with greater than 31 characters
> >>>
> >>> subroutine MatCreateMPIMatConcatenateSeqMat
> >>> subroutine MatFactorFactorizeSchurComplement
> >>> subroutine MatMPIAdjCreateNonemptySubcommMat
> >>> subroutine MatSeqAIJCUSPARSESetGenerateTranspose
> >>> subroutine MatMPIAIJSetUseScalableIncreaseOverlap
> >>> subroutine MatFactorSolveSchurComplementTranspose
> >>>
> >>> I individually ifdef'd them out of the source file and was able to
> >>> compile the files successfully without encountering the ICE.
> >>>
> >>> I'm not exactly sure what the maximum subroutine name length that the XLF
> >>> compiler allows, but if it is only 31, it would be useful if the compiler
> >>> detected this and issue a message instead of the ICE.
> >>> Adding the option -qlanglvl=2003std or -qlanglvl=2008std produces a bunch
> >>> of other warning messages, but it still encounters the ICE. So, I'm
> >>> uncertain if the subroutine name length is the root of the problem.
> >>>
> >>> Is it possible for you to use subroutines that are less than 32
> >>> characters and see if that works four you? Have you used other fortran 90
> >>> compilers and do any of them complain of this?
> >>> Are there any unusual or questionable language constructs used in any of
> >>> the functions mentioned above that may possibly challenge the compiler?
> >>>
> >>> I'll package this up and send it to the IBM XL compiler development team
> >>> for their examination and comment.
> >>>
> >>> Best Regards,
> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> Roy Musselman
> >>> IBM HPC Application Analyst at Lawrence Livermore National Lab
> >>> email: [email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>
> >>> LLNL office: 925-422-6033
> >>> Cell: 507-358-8895, Home: 507-281-9565
> >>>
> >>> <graycol.gif>Jacob Faibussowitsch ---02/18/2021 02:17:05 PM---> The most
> >>> recently built version available on the CORAL systems is 3.13.0. (ml load
> >>> petsc/3.13.0) W
> >>>
> >>> From: Jacob Faibussowitsch <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> To: Roy Musselman <[email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>>
> >>> Cc: "Gyllenhaal, John C." <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> Date: 02/18/2021 02:17 PM
> >>> Subject: [EXTERNAL] Re: xlf90_r Internal Compiler Error
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> The most recently built version available on the CORAL systems...
> >>> This Message Is From an External Sender
> >>> This message came from outside your organization.
> >>> The most recently built version available on the CORAL systems is 3.13.0.
> >>> (ml load petsc/3.13.0) Will that work for you?
> >>> I am building petsc from source as part of development work on petsc
> >>> itself so modules are unfortunately not useful here.
> >>> The files you sent me do not contain all the dependencies (other mod
> >>> files) required to reproduce the error.
> >>> I'll attempt to build version 3.14.4 from scratch and recreate the
> >>> failing symptom you are observing.
> >>> Yes, petsc uses an automated system to generate the fortran files from C
> >>> which goes about 20 rabbit holes deeper than I was willing to dig. Let me
> >>> know if you run into trouble configuring and building petsc, I can point
> >>> you in the right direction. I’ve attached a “reconfigure” script with
> >>> this email, it contains all of the arguments I used to configure petsc
> >>> successfully on Lassen. If you place it into your $PETSC_DIR (i.e. the
> >>> folder titled “petsc” and that contains a “configure” file) and run:
> >>>
> >>> $ python3 ./reconfigure-arch-linux-c-debug.py
> >>>
> >>> It should work. If not, you will have to
> >>>
> >>> $ ./configure —all-the-args —in-the-reconfigure —file
> >>>
> >>> Best regards,
> >>>
> >>> Jacob Faibussowitsch
> >>> (Jacob Fai - booss - oh - vitch)
> >>> Cell: (312) 694-3391[attachment "reconfigure-arch-linux-c-debug.py"
> >>> deleted by Roy Musselman/Rochester/Contr/IBM]
> >>> On Feb 18, 2021, at 15:07, Roy Musselman <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>> wrote:
> >>> Hi Jacob,
> >>>
> >>> The source file appears to come from the PETSc 3.14.4 library. The most
> >>> recently built version available on the CORAL systems is 3.13.0. (ml load
> >>> petsc/3.13.0) Will that work for you?
> >>> The files you sent me do not contain all the dependencies (other mod
> >>> files) required to reproduce the error.
> >>> I'll attempt to build version 3.14.4 from scratch and recreate the
> >>> failing symptom you are observing.
> >>>
> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> Roy Musselman
> >>> IBM HPC Application Analyst at Lawrence Livermore National Lab
> >>> email: [email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>
> >>> LLNL office: 925-422-6033
> >>> Cell: 507-358-8895, Home: 507-281-9565
> >>>
> >>> <graycol.gif>Roy Musselman---02/18/2021 11:18:20 AM---I'll take a look.
> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Roy Musselman
> >>>
> >>> From: Roy Musselman/Rochester/Contr/IBM
> >>> To: LC Hotline <[email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>>
> >>> Cc: "Gyllenhaal, John C." <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> Date: 02/18/2021 11:18 AM
> >>> Subject: Re: [EXTERNAL] FW: xlf90_r Internal Compiler Error
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> I'll take a look.
> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>> Roy Musselman
> >>> IBM HPC Application Analyst at Lawrence Livermore National Lab
> >>> email: [email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>
> >>> LLNL office: 925-422-6033
> >>> Cell: 507-358-8895, Home: 507-281-9565
> >>>
> >>>
> >>> <graycol.gif>LC Hotline ---02/18/2021 11:03:55 AM---Hi John, Roy, Can you
> >>> help this user with the problem that he is seeing when he tries to build
> >>> with
> >>>
> >>> From: LC Hotline <[email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>>
> >>> To: "Gyllenhaal, John C." <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>, Roy Musselman <[email protected]
> >>> <mailto:[email protected]><mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> Date: 02/18/2021 11:03 AM
> >>> Subject: [EXTERNAL] FW: xlf90_r Internal Compiler Error
> >>>
> >>>
> >>>
> >>> Hi John, Roy, Can you help this user with the problem that he is...
> >>> This Message Is From an External Sender
> >>> This message came from outside your organization.
> >>> Hi John, Roy,
> >>>
> >>> Can you help this user with the problem that he is seeing when he tries
> >>> to build with xlf90 on Lassen?
> >>>
> >>> Thanks,
> >>> Ryan
> >>> --
> >>> LC Hotline
> >>>
> >>> From: Jacob Faibussowitsch <[email protected]
> >>> <mailto:[email protected]> <mailto:[email protected]
> >>> <mailto:[email protected]>>>
> >>> Date: Wednesday, February 17, 2021 at 5:27 PM
> >>> To: LC Hotline <[email protected] <mailto:[email protected]>
> >>> <mailto:[email protected] <mailto:[email protected]>>>
> >>> Subject: xlf90_r Internal Compiler Error
> >>>
> >>> Hello LC Support,
> >>>
> >>> While compiling my application on Lassen I seem have run afoul of the
> >>> xlf90 mpi compiler wrapper with the following error:
> >>>
> >>> *** Error in
> >>> `/usr/tce/packages/xl/xl-2020.11.12/xlf/16.1.1/exe/xlfentry': free():
> >>> invalid pointer: 0x0000200001740018 ***
> >>>
> >>> I’m fairly certain this isn’t my fault as this is code that compiles
> >>> regularly on extensive CI/CD under various other compilers and machines,
> >>> but you can never rule it out. I have included a verbose full log of my
> >>> make run (which includes a comprehensive rundown of the environment) as
> >>> well as a separate file containing the error message and stack trace from
> >>> the compiler. Additionally I have also included the file which I believe
> >>> is causing the error. Let me know if there is anything else I should send.
> >>>
> >>> P.S. My list of loaded modules:
> >>>
> >>> Currently Loaded Modules:
> >>> 1) StdEnv (S) 4) cuda/11.1.1 7) valgrind/3.16.1
> >>> 2) clang/ibm-11.0.0 5) python/3.8.2 8) lapack/3.9.0-xl-2020.11.12
> >>> 3) spectrum-mpi/rolling-release 6) cmake/3.18.0 9) hip/3.0.0
> >>>
> >>> Best regards,
> >>>
> >>> Jacob Faibussowitsch
> >>> (Jacob Fai - booss - oh - vitch)
> >>> Cell: (312) 694-3391[attachment "errorReport.zip" deleted by Roy
> >>> Musselman/Rochester/Contr/IBM]
>
>