I wonder if gfortran has a similar "bug" but larger capacity that would explain why it's so much more expensive to compile the Fortran interface files than to compile all of PETSc.
Barry Smith <[email protected]> writes: > PETSc stacks the Fortran modules in the same way it stacks the C include > files. So the TAO module includes all the Fortran modules below it etc. It > would be nearly impossible to disentangle the bits and pieces without > introducing a more painful user experience. For example use PCTypes, use > PCFunctions, use KSPTypes, .... impossible to use and impossible to maintain. > > This is a completely artificial bug of IBM's own making in their compiler > that we should not have to work around. > > Barry > > >> On Mar 3, 2021, at 12:10 PM, Jacob Faibussowitsch <[email protected]> >> wrote: >> >> Hello All, >> >> I discovered a compiler bug in the IBM xl fortran compiler a few weeks ago >> that would crash the compiler when compiling petsc fortran interfaces. The >> TL;DR of it is that the xl compiler creates a function dictionary for every >> function imported in fortran modules, and since petsc fortran interfaces >> seem to import entire packages writ-large this exceeds the number of >> dictionary entries (2**21): >> >>> The reason for the Internal Compiler Error is because we can't grow an >>> interal dictionary anymore (ie we hit a 2**21 limit). >>> The file contains many module procedures and interfaces that use the same >>> helper module. As a result, we are importing the dictionary entries for >>> that module repeatedly reaching >>> the limit. >>> >>> Can you please give the following source code workaround a try? >>> Since there is already "use petscvecdefdummy" at the module scope, one >>> workaround might be to remove the unnecessary "use petscvecdefdummy" in >>> vecnotequal and vecequals >>> and all similar procedures. >>> >>> For example, the test case has: >>> module petscvecdef >>> use petscvecdefdummy >>> ... >>> function vecnotequal(A,B) >>> use petscvecdefdummy >>> logical vecnotequal >>> type(tVec), intent(in) :: A,B >>> vecnotequal = (A%v .ne. B%v) >>> end function >>> function vecequals(A,B) >>> use petscvecdefdummy >>> logical vecequals >>> type(tVec), intent(in) :: A,B >>> vecequals = (A%v .eq. B%v) >>> end function >>> ... >>> end module >>> Another workaround would be to put the procedure definitions from this >>> large module into several submodules. Each submodule would be able to >>> accommodate a dictionary with 2**21 entries. >>> >>> >>> Please let us know if one of the above workarounds resolve the issue. >> >> >> The proposed fix from IBM would be to pull “use moduleXXX” out of >> subroutines or to have our auto-fortran interfaces detect which symbols to >> include from the respective modules and only include those in the >> subroutines. I’m not familiar at all with how the interfaces are generated >> so I don’t even know if this is possible. >>> IBM provided the following additional explanation and example. Can the >>> process used to generate these routines and functions determine the >>> specific symbols required and then use the only keyword or import statement >>> to include them? >>> >>> When factoring out use statements out of module procedures, you can just >>> delete them. But you can't completely remove them from interface blocks. >>> Instead, you can limit them either by using use <module>, only: <symbol> or >>> import <symbol> . if the hundreds of use statements in the program are >>> factored out / limited in this way, that should reduce the dictionary size >>> sufficiently for the program to compile. >>> >>> For example >>> Interface >>> Subroutine VecRestoreArrayReadF90(v,array,ierr) >>> use petscvecdef >>> real(kind=selected_real_kind(10)), pointer :: array(:) >>> integer(kind=selected_int_kind(5)) ierr >>> type(tVec) v >>> End Subroutine >>> End Interface >>> >>> imports all symbols from petscvecdef into the dictionary even though we >>> only need tVec . So we can either: >>> >>> Interface >>> Subroutine VecRestoreArrayReadF90(v,array,ierr) >>> use petscvecdef, only: tVec >>> implicit none >>> real(kind=selected_real_kind(10)), pointer :: array(:) >>> integer(kind=selected_int_kind(5)) ierr >>> type(tVec) v >>> End Subroutine >>> End Interface >>> >>> or if use petscvecdef is used in the outer scope, we can: >>> Interface >>> Subroutine VecRestoreArrayReadF90(v,array,ierr) >>> import tVec >>> implicit none >>> real(kind=selected_real_kind(10)), pointer :: array(:) >>> integer(kind=selected_int_kind(5)) ierr >>> type(tVec) v >>> End Subroutine >>> End Interface >>> (The two methods (use, only vs import) are equivalent in terms of impact to >>> the dictionary.) >>> >> >> Is this compiler ~feature~ something that we intend to work around? Thoughts? >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> Cell: (312) 694-3391 >> >>> Begin forwarded message: >>> >>> From: "Roy Musselman" <[email protected] <mailto:[email protected]>> >>> Subject: Re: Case TS005062693 - XLF: ICE in xlfentry compiling a module >>> with 358 subroutines >>> Date: March 3, 2021 at 08:23:17 CST >>> To: Jacob Faibussowitsch <[email protected] >>> <mailto:[email protected]>> >>> Cc: "Gyllenhaal, John C." <[email protected] >>> <mailto:[email protected]>> >>> >>> Hi Jacob, >>> I tried the first suggestion and commented out the use statements called >>> within the functions. However, I hit the following error complaining about >>> specific symbol dependencies provided by the library. >>> >>> .../src/vec/f90-mod/petscvecmod.F90", line 107.37: 1514-084 (S) Identifier >>> a is being declared with type name tvec which has not been defined in a >>> derived type definition. >>> >>> IBM provided the following additional explanation and example. Can the >>> process used to generate these routines and functions determine the >>> specific symbols required and then use the only keyword or import statement >>> to include them? >>> >>> When factoring out use statements out of module procedures, you can just >>> delete them. But you can't completely remove them from interface blocks. >>> Instead, you can limit them either by using use <module>, only: <symbol> or >>> import <symbol> . if the hundreds of use statements in the program are >>> factored out / limited in this way, that should reduce the dictionary size >>> sufficiently for the program to compile. >>> >>> For example >>> Interface >>> Subroutine VecRestoreArrayReadF90(v,array,ierr) >>> use petscvecdef >>> real(kind=selected_real_kind(10)), pointer :: array(:) >>> integer(kind=selected_int_kind(5)) ierr >>> type(tVec) v >>> End Subroutine >>> End Interface >>> >>> imports all symbols from petscvecdef into the dictionary even though we >>> only need tVec . So we can either: >>> >>> Interface >>> Subroutine VecRestoreArrayReadF90(v,array,ierr) >>> use petscvecdef, only: tVec >>> implicit none >>> real(kind=selected_real_kind(10)), pointer :: array(:) >>> integer(kind=selected_int_kind(5)) ierr >>> type(tVec) v >>> End Subroutine >>> End Interface >>> >>> or if use petscvecdef is used in the outer scope, we can: >>> Interface >>> Subroutine VecRestoreArrayReadF90(v,array,ierr) >>> import tVec >>> implicit none >>> real(kind=selected_real_kind(10)), pointer :: array(:) >>> integer(kind=selected_int_kind(5)) ierr >>> type(tVec) v >>> End Subroutine >>> End Interface >>> (The two methods (use, only vs import) are equivalent in terms of impact to >>> the dictionary.) >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Roy Musselman >>> IBM HPC Application Analyst at Lawrence Livermore National Lab >>> email: [email protected] <mailto:[email protected]> >>> LLNL office: 925-422-6033 >>> Cell: 507-358-8895, Home: 507-281-9565 >>> >>> <graycol.gif>Roy Musselman---02/24/2021 07:08:45 PM---Hi Jacob, I opened >>> the ticket with IBM: case TS005062693 and and the local LLNL Sierra Jira >>> Ticket >>> >>> From: Roy Musselman/Rochester/Contr/IBM >>> To: Jacob Faibussowitsch <[email protected] >>> <mailto:[email protected]>> >>> Cc: "Gyllenhaal, John C." <[email protected] >>> <mailto:[email protected]>> >>> Date: 02/24/2021 07:08 PM >>> Subject: Re: [EXTERNAL] Case TS005062693 - XLF: ICE in xlfentry compiling >>> a module with 358 subroutines >>> >>> >>> >>> Hi Jacob, >>> I opened the ticket with IBM: case TS005062693 and and the local LLNL >>> Sierra Jira Ticket at >>> https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues >>> <https://urldefense.com/v3/__https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues__;!!DZ3fjg!vDUpTg4q6jg1lQwt37jm9Uzc7MqGrEdrg0wpKgGq9P5JoR3jKrqncOAKyni2BEUYOxQ$> >>> >>> Today IBM provided the response below. I don't know when I'll have time to >>> try it on the reproducer I gave IBM. Perhaps early next week. Can you >>> review this and see if it helps? >>> >>> The reason for the Internal Compiler Error is because we can't grow an >>> interal dictionary anymore (ie we hit a 2**21 limit). >>> The file contains many module procedures and interfaces that use the same >>> helper module. As a result, we are importing the dictionary entries for >>> that module repeatedly reaching >>> the limit. >>> >>> Can you please give the following source code workaround a try? >>> Since there is already "use petscvecdefdummy" at the module scope, one >>> workaround might be to remove the unnecessary "use petscvecdefdummy" in >>> vecnotequal and vecequals >>> and all similar procedures. >>> >>> For example, the test case has: >>> module petscvecdef >>> use petscvecdefdummy >>> ... >>> function vecnotequal(A,B) >>> use petscvecdefdummy >>> logical vecnotequal >>> type(tVec), intent(in) :: A,B >>> vecnotequal = (A%v .ne. B%v) >>> end function >>> function vecequals(A,B) >>> use petscvecdefdummy >>> logical vecequals >>> type(tVec), intent(in) :: A,B >>> vecequals = (A%v .eq. B%v) >>> end function >>> ... >>> end module >>> Another workaround would be to put the procedure definitions from this >>> large module into several submodules. Each submodule would be able to >>> accommodate a dictionary with 2**21 entries. >>> >>> >>> Please let us know if one of the above workarounds resolve the issue. >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Roy Musselman >>> IBM HPC Application Analyst at Lawrence Livermore National Lab >>> email: [email protected] <mailto:[email protected]> >>> LLNL office: 925-422-6033 >>> Cell: 507-358-8895, Home: 507-281-9565 >>> >>> >>> <graycol.gif>Roy Musselman---02/21/2021 09:42:55 PM---Hi Jacob, After some >>> more experimentation, I think I may have found what is triggering the ICE. >>> It >>> >>> From: Roy Musselman/Rochester/Contr/IBM >>> To: Jacob Faibussowitsch <[email protected] >>> <mailto:[email protected]>> >>> Cc: "Gyllenhaal, John C." <[email protected] >>> <mailto:[email protected]>> >>> Date: 02/21/2021 09:42 PM >>> Subject: Re: [EXTERNAL] Re: xlf90_r Internal Compiler Error >>> >>> >>> Hi Jacob, >>> >>> After some more experimentation, I think I may have found what is >>> triggering the ICE. It doesn't appear to be related to the subroutine name >>> length. I think the compiler may be hitting an internal limit of the number >>> of subroutines within a module. There are 358 subroutines contained in the >>> expanded petscmatmod.F90. Removing 4 subroutines will allow the compile to >>> complete successfully, so the limit must be 354 subroutines. Is it possible >>> for you to bust up petscmatmod into multiple modules? I'll package up the >>> reproducer and pass it on to the compiler development team. >>> >>> I've asked for user feedback a couple years ago, when the IBM Power9 >>> CORAL-1 Sierra systems were deployed, but received minimal responses. DOE >>> is now working with Cray (aka HPE) developing the environment for the >>> CORAL-2 system (El Capitan). I'll pass your request to the LLNL person I >>> know that is dealing with math libraries for CORAL-2. >>> >>> We use the spack tool to download and build petsc and its specified >>> dependencies. I switched between the PETSC versions by changing the >>> PETSCDIR variable in the script I shared with you. I've attached a tar ball >>> containing the scripts used to build PETSc via spack. >>> >>> [attachment "bld-petsc-spack.tgz" deleted by Roy >>> Musselman/Rochester/Contr/IBM] >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Roy Musselman >>> IBM HPC Application Analyst at Lawrence Livermore National Lab >>> email: [email protected] <mailto:[email protected]> >>> LLNL office: 925-422-6033 >>> Cell: 507-358-8895, Home: 507-281-9565 >>> >>> >>> <graycol.gif>Jacob Faibussowitsch ---02/21/2021 12:24:11 PM---Hi Roy, > I'm >>> not sure which projects at LLNL are using PETSc or if they chose to build >>> their own ve >>> >>> From: Jacob Faibussowitsch <[email protected] >>> <mailto:[email protected]>> >>> To: Roy Musselman <[email protected] <mailto:[email protected]>> >>> Cc: "Gyllenhaal, John C." <[email protected] >>> <mailto:[email protected]>> >>> Date: 02/21/2021 12:24 PM >>> Subject: [EXTERNAL] Re: xlf90_r Internal Compiler Error >>> >>> >>> >>> Hi Roy, I'm not sure which projects at LLNL are using PETSc or if they >>> chose to build their own version. Entirely unrelated to our problem, but is >>> it possible to find this out? It would be great if yes, but also completely >>> fine if not. PETSc >>> Hi Roy, >>> I'm not sure which projects at LLNL are using PETSc or if they chose to >>> build their own version. >>> Entirely unrelated to our problem, but is it possible to find this out? It >>> would be great if yes, but also completely fine if not. PETSc is >>> potentially undergoing a rather transformative rewrite over the next few >>> years and we’d like to gather current usage data to get a better idea of >>> where PETSc fits into our users workflows. But we aren’t sure how to gather >>> this data (we don’t particularly want to scrape and silently send it off >>> without users consent/knowledge) absent user questionnaires and HPC usage >>> statistics. >>> If you are interested, I can share with you the spack recipes I use to >>> build petsc with hdf5, hypre, and suplerlu-dist. >>> Yes that would be quite useful. I can let it percolate through our dev >>> channels for any other recommendations etc. >>> 3.14.0 and 3.14.1 >>> >>> "../roymuss/spack-stage-petsc-3.14.0-on3lboy4slkz65tsjttgfmwghzky54jj/spack-src/src/vec/f90-mod/petscvecmod.F90", >>> line 9.13: 1514-219 (S) Unable to access module symbol file for module >>> petscisdefdummy. Check path and file permissions of file. Use association >>> not done for this module. >>> 1501-511 Compilation failed for file petscvecmod.F90. >>> How exactly did you switch between versions? PETSc has 2 types of fortran >>> bindings, “ftn-custom” and “ftn-auto” (technically 3 including the F90 >>> files, but those simply call either of the two preceding ones), a copy of >>> which you will find in every src directory. As the names imply ftn-auto is >>> auto generated while ftn-custom is hand-written. >>> >>> This also means that the ftn-auto files are __not__ tracked by git, so a >>> simple git checkout [new-tag] may not properly dispose of the old >>> auto-generated files (very rare, but IIRC we made a major enough change to >>> the fortran bindings within the last year to warrant having to "make >>> deletefortranstubs" before rebuilding). >>> Adding the option -qlanglvl=2003std or -qlanglvl=2008std produces a bunch >>> of other warning messages, but it still encounters the ICE. So, I'm >>> uncertain if the subroutine name length is the root of the problem. >>> Our current compiler flag selection philosophy is to require a minimum but >>> choose the maximum available reasonable flag for the compiler (I.e. we >>> require C99, but very often you will find that your code is compiled with >>> C11 or C17 if they are available). It is therefore odd that configure did >>> not use the same methodology for fortran compilers. I will relay this on >>> our side. >>> Is it possible for you to use subroutines that are less than 32 characters >>> and see if that works four you? Have you used other fortran 90 compilers >>> and do any of them complain of this? >>> Of all of the small quirks fortran has this is probably the most esoteric >>> one I’ve come across… I’ve attached a list of all the F90 compilers, and >>> their flags which we use in CI/CD (all of which is run multiple times daily >>> and __must__ pass). I got them all via grep, so there may be some >>> duplicates here or there. As for using shorter names, this is also >>> something we can look at, but since none of the other compilers have had >>> issues with this I’m not sure this is the change to make. >>> Are there any unusual or questionable language constructs used in any of >>> the functions mentioned above that may possibly challenge the compiler? >>> Not that I am aware of, but again I will ask around our dev channels and >>> see if anything comes to mind. >>> >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> Cell: (312) 694-3391[attachment "compilerList" deleted by Roy >>> Musselman/Rochester/Contr/IBM] >>> On Feb 20, 2021, at 22:05, Roy Musselman <[email protected] >>> <mailto:[email protected]>> wrote: >>> Hi Jacob, >>> Thanks for letting me know that you are a PETSc developer and that you are >>> testing it on the LLNL lassen system. I've used the spack build tool to >>> build and deploy a few versions on the systems. I'm not sure which projects >>> at LLNL are using PETSc or if they chose to build their own version. I did >>> however provide a single precision version upon request that was integrated >>> with MVAPICH2-MPI instead of the IBM-provided Spectrum-MPI. Here's what's >>> available on the systems today. >>> >>> > ml avail petsc >>> ----------------------------------------------------- >>> /usr/tcetmp/modulefiles/Core >>> ----------------------------------------------------- >>> petsc/default petsc/3.10.2 petsc/3.11.3 petsc/3.13.0 (D) >>> petsc/3.13.1-mvapich2-2020.01.09-xl-2020.03.18.single >>> >>> If you are interested, I can share with you the spack recipes I use to >>> build petsc with hdf5, hypre, and suplerlu-dist. >>> >>> After several attempts I was able to reproduce the Internal Compiler Errro >>> (ICE) that you are seeing using version 3.14.4. I've whittled it down to >>> the petscmatmod.F90 file and it's specific dependencies. >>> The following script is what I'm using. Note that in the 2nd set of >>> compiles, the -E option is used to expand all included source files and >>> headers and encapsulating it into a single large source file. This can be >>> used to help isolate the source of the problem. >>> >>> #!/bin/bash >>> >>> PETSCDIR="../roymuss/spack-stage-petsc-3.14.4-eh5arny7l3cqjlltlfpjp6f4jofbnmz6/spack-src" >>> >>> OPTIONS=" -qmoddir=moddir -I$PETSCDIR/arch-linux-c-opt/include >>> -I$PETSCDIR/include" >>> mkdir -p moddir >>> >>> set -x >>> >>> # Compile original source files including dependencies >>> if [ 0 = 1 ]; then >>> mpif90 -c -g $OPTIONS $PETSCDIR/src/sys/f90-mod/petscsysmod.F90 -o >>> petscsysmod.o >>> mpif90 -c -g $OPTIONS $PETSCDIR/src/vec/f90-mod/petscvecmod.F90 -o >>> petscvecmod.o >>> mpif90 -c -g $OPTIONS $PETSCDIR/src/mat/f90-mod/petscmatmod.F90 -o >>> petscmatmod.o >>> fi >>> >>> # Use -E option to expand source into full source files >>> if [ 0 = 1 ]; then >>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/sys/f90-mod/petscsysmod.F90 -o >>> full_petscsysmod.F90 >>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/vec/f90-mod/petscvecmod.F90 -o >>> full_petscvecmod.F90 >>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/mat/f90-mod/petscmatmod.F90 -o >>> full_petscmatmod.F90 >>> fi >>> >>> # Compile from full source files >>> if [ 1 = 1 ]; then >>> mpif90 -c -g -Imoddir -qmoddir=moddir full_petscsysmod.F90 -o >>> full_petscsysmod.o >>> mpif90 -c -g -Imoddir -qmoddir=moddir full_petscvecmod.F90 -o >>> full_petscvecmod.o >>> mpif90 -V -c -g -Imoddir -qmoddir=moddir full_petscmatmod.F90 -o >>> full_petscmatmod.o >>> fi >>> >>> <eof> >>> >>> Petsc 3.13.6 it the most recent version that did not fail. I tried all >>> subsequent versions and got the folowing results: >>> >>> 3.14.0 and 3.14.1 >>> >>> "../roymuss/spack-stage-petsc-3.14.0-on3lboy4slkz65tsjttgfmwghzky54jj/spack-src/src/vec/f90-mod/petscvecmod.F90", >>> line 9.13: 1514-219 (S) Unable to access module symbol file for module >>> petscisdefdummy. Check path and file permissions of file. Use association >>> not done for this module. >>> 1501-511 Compilation failed for file petscvecmod.F90. >>> >>> 3.14.2, 3.14.3, and 3.14.4 >>> >>> . . . >>> ** matnullspaceequals === End of Compilation 8 === >>> *** Error in `/usr/tce/packages/xl/xl-2020.11.12/xlf/16.1.1/exe/xlfentry': >>> free(): invalid pointer: 0x0000200001740018 *** >>> >>> Examining the tail end of petscmatmod.F90 >>> >>> >>> 80 function matnullspaceequals(A,B) >>> 81 use petscmatdefdummy >>> 82 logical matnullspaceequals >>> 83 type(tMatNullSpace), intent(in) :: A,B >>> 84 matnullspaceequals = (A%v .eq. B%v) >>> 85 end function >>> 86 >>> 87 #if defined(_WIN32) && defined(PETSC_USE_SHARED_LIBRARIES) >>> 88 !DEC$ ATTRIBUTES DLLEXPORT::matnotequal >>> 89 !DEC$ ATTRIBUTES DLLEXPORT::matequals >>> 90 !DEC$ ATTRIBUTES DLLEXPORT::matfdcoloringnotequal >>> 91 !DEC$ ATTRIBUTES DLLEXPORT::matfdcoloringequals >>> 92 !DEC$ ATTRIBUTES DLLEXPORT::matnullspacenotequal >>> 93 !DEC$ ATTRIBUTES DLLEXPORT::matnullspaceequals >>> 94 #endif >>> 95 module petscmat >>> 96 use petscmatdef >>> 97 use petscvec >>> 98 #include <../src/mat/f90-mod/petscmat.h90> >>> 99 interface >>> 100 #include <../src/mat/f90-mod/ftn-auto-interfaces/petscmat.h90> >>> 101 end interface >>> 102 end module >>> 103 >>> >>> Compiling the matnullspaceequals function was successful just before >>> hitting the error. The error goes away when removing either or both of the >>> #include lines 98 and 100. Both #include statements are required to produce >>> the error. The 3.13.6 and 3.14.4 version of the file identified in the >>> first #include at line 98 are identical. The file identified in line 100 is >>> different between 3.13.6 and 3.14.4. >>> Just looking at the list of subroutines contained within each version, the >>> following are the differences. >>> >>> Old subroutines available in 3.13.6 but removed from 4.14.4 >>> subroutine MatFreeIntermediateDataStructures(a,z) >>> >>> New subroutines available in 4.14.4 but not contained in 3.13.6 >>> subroutine MatDenseReplaceArray(a,b,z) >>> subroutine MatIsShell(a,b,z) >>> subroutine MatRARtMultEqual(a,b,c,d,e,z) >>> subroutine MatScaLAPACKGetBlockSizes(a,b,c,z) >>> subroutine MatScaLAPACKSetBlockSizes(a,b,c,z) >>> subroutine MatSeqAIJCUSPARSESetGenerateTranspose(a,b,z) >>> subroutine MatSeqAIJSetTotalPreallocation(a,b,z) >>> subroutine MatSetLayouts(a,b,c,z) >>> >>> Methodically removing the new subroutines did not provide a consistent >>> result. But I did notice the extra long subroutine name >>> MatSeqAIJCUSPARSESetGenerateTranspose had 37 characters. >>> A little research found: In Fortran 90/95 the maximum length was 31 >>> characters, in Fortran 2003 it is now 63 characters. I found the following >>> subroutines with greater than 31 characters >>> >>> subroutine MatCreateMPIMatConcatenateSeqMat >>> subroutine MatFactorFactorizeSchurComplement >>> subroutine MatMPIAdjCreateNonemptySubcommMat >>> subroutine MatSeqAIJCUSPARSESetGenerateTranspose >>> subroutine MatMPIAIJSetUseScalableIncreaseOverlap >>> subroutine MatFactorSolveSchurComplementTranspose >>> >>> I individually ifdef'd them out of the source file and was able to compile >>> the files successfully without encountering the ICE. >>> >>> I'm not exactly sure what the maximum subroutine name length that the XLF >>> compiler allows, but if it is only 31, it would be useful if the compiler >>> detected this and issue a message instead of the ICE. >>> Adding the option -qlanglvl=2003std or -qlanglvl=2008std produces a bunch >>> of other warning messages, but it still encounters the ICE. So, I'm >>> uncertain if the subroutine name length is the root of the problem. >>> >>> Is it possible for you to use subroutines that are less than 32 characters >>> and see if that works four you? Have you used other fortran 90 compilers >>> and do any of them complain of this? >>> Are there any unusual or questionable language constructs used in any of >>> the functions mentioned above that may possibly challenge the compiler? >>> >>> I'll package this up and send it to the IBM XL compiler development team >>> for their examination and comment. >>> >>> Best Regards, >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Roy Musselman >>> IBM HPC Application Analyst at Lawrence Livermore National Lab >>> email: [email protected] <mailto:[email protected]> >>> LLNL office: 925-422-6033 >>> Cell: 507-358-8895, Home: 507-281-9565 >>> >>> <graycol.gif>Jacob Faibussowitsch ---02/18/2021 02:17:05 PM---> The most >>> recently built version available on the CORAL systems is 3.13.0. (ml load >>> petsc/3.13.0) W >>> >>> From: Jacob Faibussowitsch <[email protected] >>> <mailto:[email protected]>> >>> To: Roy Musselman <[email protected] <mailto:[email protected]>> >>> Cc: "Gyllenhaal, John C." <[email protected] >>> <mailto:[email protected]>> >>> Date: 02/18/2021 02:17 PM >>> Subject: [EXTERNAL] Re: xlf90_r Internal Compiler Error >>> >>> >>> >>> >>> >>> The most recently built version available on the CORAL systems... >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> The most recently built version available on the CORAL systems is 3.13.0. >>> (ml load petsc/3.13.0) Will that work for you? >>> I am building petsc from source as part of development work on petsc itself >>> so modules are unfortunately not useful here. >>> The files you sent me do not contain all the dependencies (other mod files) >>> required to reproduce the error. >>> I'll attempt to build version 3.14.4 from scratch and recreate the failing >>> symptom you are observing. >>> Yes, petsc uses an automated system to generate the fortran files from C >>> which goes about 20 rabbit holes deeper than I was willing to dig. Let me >>> know if you run into trouble configuring and building petsc, I can point >>> you in the right direction. I’ve attached a “reconfigure” script with this >>> email, it contains all of the arguments I used to configure petsc >>> successfully on Lassen. If you place it into your $PETSC_DIR (i.e. the >>> folder titled “petsc” and that contains a “configure” file) and run: >>> >>> $ python3 ./reconfigure-arch-linux-c-debug.py >>> >>> It should work. If not, you will have to >>> >>> $ ./configure —all-the-args —in-the-reconfigure —file >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> Cell: (312) 694-3391[attachment "reconfigure-arch-linux-c-debug.py" deleted >>> by Roy Musselman/Rochester/Contr/IBM] >>> On Feb 18, 2021, at 15:07, Roy Musselman <[email protected] >>> <mailto:[email protected]>> wrote: >>> Hi Jacob, >>> >>> The source file appears to come from the PETSc 3.14.4 library. The most >>> recently built version available on the CORAL systems is 3.13.0. (ml load >>> petsc/3.13.0) Will that work for you? >>> The files you sent me do not contain all the dependencies (other mod files) >>> required to reproduce the error. >>> I'll attempt to build version 3.14.4 from scratch and recreate the failing >>> symptom you are observing. >>> >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Roy Musselman >>> IBM HPC Application Analyst at Lawrence Livermore National Lab >>> email: [email protected] <mailto:[email protected]> >>> LLNL office: 925-422-6033 >>> Cell: 507-358-8895, Home: 507-281-9565 >>> >>> <graycol.gif>Roy Musselman---02/18/2021 11:18:20 AM---I'll take a look. >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Roy Musselman >>> >>> From: Roy Musselman/Rochester/Contr/IBM >>> To: LC Hotline <[email protected] <mailto:[email protected]>> >>> Cc: "Gyllenhaal, John C." <[email protected] >>> <mailto:[email protected]>> >>> Date: 02/18/2021 11:18 AM >>> Subject: Re: [EXTERNAL] FW: xlf90_r Internal Compiler Error >>> >>> >>> >>> >>> >>> I'll take a look. >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> Roy Musselman >>> IBM HPC Application Analyst at Lawrence Livermore National Lab >>> email: [email protected] <mailto:[email protected]> >>> LLNL office: 925-422-6033 >>> Cell: 507-358-8895, Home: 507-281-9565 >>> >>> >>> <graycol.gif>LC Hotline ---02/18/2021 11:03:55 AM---Hi John, Roy, Can you >>> help this user with the problem that he is seeing when he tries to build >>> with >>> >>> From: LC Hotline <[email protected] <mailto:[email protected]>> >>> To: "Gyllenhaal, John C." <[email protected] >>> <mailto:[email protected]>>, Roy Musselman <[email protected] >>> <mailto:[email protected]>> >>> Date: 02/18/2021 11:03 AM >>> Subject: [EXTERNAL] FW: xlf90_r Internal Compiler Error >>> >>> >>> >>> Hi John, Roy, Can you help this user with the problem that he is... >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> Hi John, Roy, >>> >>> Can you help this user with the problem that he is seeing when he tries to >>> build with xlf90 on Lassen? >>> >>> Thanks, >>> Ryan >>> -- >>> LC Hotline >>> >>> From: Jacob Faibussowitsch <[email protected] >>> <mailto:[email protected]>> >>> Date: Wednesday, February 17, 2021 at 5:27 PM >>> To: LC Hotline <[email protected] <mailto:[email protected]>> >>> Subject: xlf90_r Internal Compiler Error >>> >>> Hello LC Support, >>> >>> While compiling my application on Lassen I seem have run afoul of the xlf90 >>> mpi compiler wrapper with the following error: >>> >>> *** Error in `/usr/tce/packages/xl/xl-2020.11.12/xlf/16.1.1/exe/xlfentry': >>> free(): invalid pointer: 0x0000200001740018 *** >>> >>> I’m fairly certain this isn’t my fault as this is code that compiles >>> regularly on extensive CI/CD under various other compilers and machines, >>> but you can never rule it out. I have included a verbose full log of my >>> make run (which includes a comprehensive rundown of the environment) as >>> well as a separate file containing the error message and stack trace from >>> the compiler. Additionally I have also included the file which I believe is >>> causing the error. Let me know if there is anything else I should send. >>> >>> P.S. My list of loaded modules: >>> >>> Currently Loaded Modules: >>> 1) StdEnv (S) 4) cuda/11.1.1 7) valgrind/3.16.1 >>> 2) clang/ibm-11.0.0 5) python/3.8.2 8) lapack/3.9.0-xl-2020.11.12 >>> 3) spectrum-mpi/rolling-release 6) cmake/3.18.0 9) hip/3.0.0 >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> Cell: (312) 694-3391[attachment "errorReport.zip" deleted by Roy >>> Musselman/Rochester/Contr/IBM] >>
