> I'm not sure what would happen if these 'use' statements are removed [whats 
> required and what can be removed?]
> 
> The relevant code that adds this is in 
> lib/petsc/bin/maint/generatefortranstubs.py
> 
>              fd.write('      use petsc'+mansec+'def\n')

I suppose we can run it through CI, see if it breaks? 

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)
Cell: (312) 694-3391

> On Mar 3, 2021, at 12:49, Satish Balay <[email protected]> wrote:
> 
> On Wed, 3 Mar 2021, Jacob Faibussowitsch wrote:
> 
>> Hello All,
>> 
>> I discovered a compiler bug in the IBM xl fortran compiler a few weeks ago 
>> that would crash the compiler when compiling petsc fortran interfaces. The 
>> TL;DR of it is that the xl compiler creates a function dictionary for every 
>> function imported in fortran modules, and since petsc fortran interfaces 
>> seem to import entire packages writ-large this exceeds the number of 
>> dictionary entries (2**21):
>> 
>>> The reason for the Internal Compiler Error is because we can't grow an 
>>> interal dictionary anymore (ie we hit a 2**21 limit).
>>> The file contains many module procedures and interfaces that use the same 
>>> helper module. As a result, we are importing the dictionary entries for 
>>> that module repeatedly reaching 
>>> the limit.
>>> 
>>> Can you please give the following source code workaround a try?
>>> Since there is already "use petscvecdefdummy" at the module scope, one 
>>> workaround might be to remove the unnecessary "use petscvecdefdummy" in 
>>> vecnotequal and vecequals 
>>> and all similar procedures.
>>> 
>>> For example, the test case has:
>>>        module petscvecdef
>>>        use petscvecdefdummy
>>> ...
>>>        function vecnotequal(A,B)
>>>          use petscvecdefdummy
>>>          logical vecnotequal
>>>          type(tVec), intent(in) :: A,B
>>>          vecnotequal = (A%v .ne. B%v)
>>>        end function
>>>        function vecequals(A,B)
>>>          use petscvecdefdummy
>>>          logical vecequals
>>>          type(tVec), intent(in) :: A,B
>>>          vecequals = (A%v .eq. B%v)
>>>        end function
>>> ...
>>> end module
>>> Another workaround would be to put the procedure definitions from this 
>>> large module into several submodules.  Each submodule would be able to 
>>> accommodate a dictionary with 2**21 entries.
>>> 
>>> 
>>> Please let us know if one of the above workarounds resolve the issue.
>> 
>> 
>> The proposed fix from IBM would be to pull “use moduleXXX” out of 
>> subroutines or to have our auto-fortran interfaces detect which symbols to 
>> include from the respective modules and only include those in the 
>> subroutines. I’m not familiar at all with how the interfaces are generated 
>> so I don’t even know if this is possible.
> 
> I'm not sure what would happen if these 'use' statements are removed [whats 
> required and what can be removed?]
> 
> The relevant code that adds this is in 
> lib/petsc/bin/maint/generatefortranstubs.py
> 
>              fd.write('      use petsc'+mansec+'def\n')
> 
> Satish
> 
>>> IBM provided the following additional explanation and example. Can the 
>>> process used to generate these routines and functions determine the 
>>> specific symbols required and then use the only keyword or import statement 
>>> to include them?
>>> 
>>> When factoring out use statements out of module procedures, you can just 
>>> delete them.  But you can't completely remove them from interface blocks.  
>>> Instead, you can limit them either by using use <module>, only: <symbol> or 
>>> import <symbol> . if the hundreds of use statements in the program are 
>>> factored out / limited in this way, that should reduce the dictionary size 
>>> sufficiently for the program to compile.
>>> 
>>> For example
>>>      Interface
>>>        Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>>          use petscvecdef
>>>          real(kind=selected_real_kind(10)), pointer :: array(:)
>>>          integer(kind=selected_int_kind(5)) ierr
>>>          type(tVec)     v
>>>        End Subroutine
>>>      End Interface
>>> 
>>> imports all symbols from petscvecdef into the dictionary even though we 
>>> only need tVec .  So we can either:
>>> 
>>>      Interface
>>>        Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>>          use petscvecdef, only: tVec
>>>          implicit none
>>>          real(kind=selected_real_kind(10)), pointer :: array(:)
>>>          integer(kind=selected_int_kind(5)) ierr
>>>          type(tVec)     v
>>>        End Subroutine
>>>      End Interface
>>> 
>>> or if use petscvecdef is used in the outer scope, we can:
>>>      Interface
>>>        Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>>          import tVec
>>>          implicit none
>>>          real(kind=selected_real_kind(10)), pointer :: array(:)
>>>          integer(kind=selected_int_kind(5)) ierr
>>>          type(tVec)     v
>>>        End Subroutine
>>>      End Interface
>>> (The two methods (use, only vs import) are equivalent in terms of impact to 
>>> the dictionary.)
>>> 
>> 
>> Is this compiler ~feature~ something that we intend to work around? Thoughts?
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> Cell: (312) 694-3391
>> 
>>> Begin forwarded message:
>>> 
>>> From: "Roy Musselman" <[email protected]>
>>> Subject: Re: Case TS005062693 - XLF: ICE in xlfentry compiling a module 
>>> with 358 subroutines
>>> Date: March 3, 2021 at 08:23:17 CST
>>> To: Jacob Faibussowitsch <[email protected]>
>>> Cc: "Gyllenhaal, John C." <[email protected]>
>>> 
>>> Hi Jacob, 
>>> I tried the first suggestion and commented out the use statements called 
>>> within the functions. However, I hit the following error complaining about 
>>> specific symbol dependencies provided by the library.
>>> 
>>> .../src/vec/f90-mod/petscvecmod.F90", line 107.37: 1514-084 (S) Identifier 
>>> a is being declared with type name tvec which has not been defined in a 
>>> derived type definition. 
>>> 
>>> IBM provided the following additional explanation and example. Can the 
>>> process used to generate these routines and functions determine the 
>>> specific symbols required and then use the only keyword or import statement 
>>> to include them?
>>> 
>>> When factoring out use statements out of module procedures, you can just 
>>> delete them.  But you can't completely remove them from interface blocks.  
>>> Instead, you can limit them either by using use <module>, only: <symbol> or 
>>> import <symbol> . if the hundreds of use statements in the program are 
>>> factored out / limited in this way, that should reduce the dictionary size 
>>> sufficiently for the program to compile.
>>> 
>>> For example
>>>      Interface
>>>        Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>>          use petscvecdef
>>>          real(kind=selected_real_kind(10)), pointer :: array(:)
>>>          integer(kind=selected_int_kind(5)) ierr
>>>          type(tVec)     v
>>>        End Subroutine
>>>      End Interface
>>> 
>>> imports all symbols from petscvecdef into the dictionary even though we 
>>> only need tVec .  So we can either:
>>> 
>>>      Interface
>>>        Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>>          use petscvecdef, only: tVec
>>>          implicit none
>>>          real(kind=selected_real_kind(10)), pointer :: array(:)
>>>          integer(kind=selected_int_kind(5)) ierr
>>>          type(tVec)     v
>>>        End Subroutine
>>>      End Interface
>>> 
>>> or if use petscvecdef is used in the outer scope, we can:
>>>      Interface
>>>        Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>>          import tVec
>>>          implicit none
>>>          real(kind=selected_real_kind(10)), pointer :: array(:)
>>>          integer(kind=selected_int_kind(5)) ierr
>>>          type(tVec)     v
>>>        End Subroutine
>>>      End Interface
>>> (The two methods (use, only vs import) are equivalent in terms of impact to 
>>> the dictionary.)
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Roy Musselman
>>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>>> email: [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>> LLNL office: 925-422-6033
>>> Cell: 507-358-8895, Home: 507-281-9565
>>> 
>>> Roy Musselman---02/24/2021 07:08:45 PM---Hi Jacob, I opened the ticket with 
>>> IBM: case TS005062693 and and the local LLNL Sierra Jira Ticket
>>> 
>>> From:  Roy Musselman/Rochester/Contr/IBM
>>> To:  Jacob Faibussowitsch <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> Cc:  "Gyllenhaal, John C." <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> Date:  02/24/2021 07:08 PM
>>> Subject:  Re: [EXTERNAL] Case TS005062693 - XLF: ICE in xlfentry compiling 
>>> a module with 358 subroutines
>>> 
>>> 
>>> 
>>> Hi Jacob, 
>>> I opened the ticket with IBM: case TS005062693 and and the local LLNL 
>>> Sierra Jira Ticket at
>>> https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues 
>>> <https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues><https://urldefense.com/v3/__https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues__;!!DZ3fjg!vDUpTg4q6jg1lQwt37jm9Uzc7MqGrEdrg0wpKgGq9P5JoR3jKrqncOAKyni2BEUYOxQ$
>>>  
>>> <https://urldefense.com/v3/__https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues__;!!DZ3fjg!vDUpTg4q6jg1lQwt37jm9Uzc7MqGrEdrg0wpKgGq9P5JoR3jKrqncOAKyni2BEUYOxQ$>>
>>> 
>>> Today IBM provided the response below. I don't know when I'll have time to 
>>> try it on the reproducer I gave IBM. Perhaps early next week. Can you 
>>> review this and see if it helps? 
>>> 
>>> The reason for the Internal Compiler Error is because we can't grow an 
>>> interal dictionary anymore (ie we hit a 2**21 limit).
>>> The file contains many module procedures and interfaces that use the same 
>>> helper module. As a result, we are importing the dictionary entries for 
>>> that module repeatedly reaching 
>>> the limit.
>>> 
>>> Can you please give the following source code workaround a try?
>>> Since there is already "use petscvecdefdummy" at the module scope, one 
>>> workaround might be to remove the unnecessary "use petscvecdefdummy" in 
>>> vecnotequal and vecequals 
>>> and all similar procedures.
>>> 
>>> For example, the test case has:
>>>        module petscvecdef
>>>        use petscvecdefdummy
>>> ...
>>>        function vecnotequal(A,B)
>>>          use petscvecdefdummy
>>>          logical vecnotequal
>>>          type(tVec), intent(in) :: A,B
>>>          vecnotequal = (A%v .ne. B%v)
>>>        end function
>>>        function vecequals(A,B)
>>>          use petscvecdefdummy
>>>          logical vecequals
>>>          type(tVec), intent(in) :: A,B
>>>          vecequals = (A%v .eq. B%v)
>>>        end function
>>> ...
>>> end module
>>> Another workaround would be to put the procedure definitions from this 
>>> large module into several submodules.  Each submodule would be able to 
>>> accommodate a dictionary with 2**21 entries.
>>> 
>>> 
>>> Please let us know if one of the above workarounds resolve the issue.
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Roy Musselman
>>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>>> email: [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>> LLNL office: 925-422-6033
>>> Cell: 507-358-8895, Home: 507-281-9565
>>> 
>>> 
>>> Roy Musselman---02/21/2021 09:42:55 PM---Hi Jacob, After some more 
>>> experimentation, I think I may have found what is triggering the ICE. It
>>> 
>>> From:  Roy Musselman/Rochester/Contr/IBM
>>> To:  Jacob Faibussowitsch <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> Cc:  "Gyllenhaal, John C." <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> Date:  02/21/2021 09:42 PM
>>> Subject:  Re: [EXTERNAL] Re: xlf90_r Internal Compiler Error
>>> 
>>> 
>>> Hi Jacob, 
>>> 
>>> After some more experimentation, I think I may have found what is 
>>> triggering the ICE. It doesn't appear to be related to the subroutine name 
>>> length. I think the compiler may be hitting an internal limit of the number 
>>> of subroutines within a module. There are 358 subroutines contained in the 
>>> expanded petscmatmod.F90. Removing 4 subroutines will allow the compile to 
>>> complete successfully, so the limit must be 354 subroutines. Is it possible 
>>> for you to bust up petscmatmod into multiple modules? I'll package up the 
>>> reproducer and pass it on to the compiler development team.
>>> 
>>> I've asked for user feedback a couple years ago, when the IBM Power9 
>>> CORAL-1 Sierra systems were deployed, but received minimal responses. DOE 
>>> is now working with Cray (aka HPE) developing the environment for the 
>>> CORAL-2 system (El Capitan). I'll pass your request to the LLNL person I 
>>> know that is dealing with math libraries for CORAL-2.
>>> 
>>> We use the spack tool to download and build petsc and its specified 
>>> dependencies. I switched between the PETSC versions by changing the 
>>> PETSCDIR variable in the script I shared with you. I've attached a tar ball 
>>> containing the scripts used to build PETSc via spack.
>>> 
>>> [attachment "bld-petsc-spack.tgz" deleted by Roy 
>>> Musselman/Rochester/Contr/IBM] 
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Roy Musselman
>>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>>> email: [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>> LLNL office: 925-422-6033
>>> Cell: 507-358-8895, Home: 507-281-9565
>>> 
>>> 
>>> Jacob Faibussowitsch ---02/21/2021 12:24:11 PM---Hi Roy, > I'm not sure 
>>> which projects at LLNL are using PETSc or if they chose to build their own 
>>> ve
>>> 
>>> From:  Jacob Faibussowitsch <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> To:  Roy Musselman <[email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>>
>>> Cc:  "Gyllenhaal, John C." <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> Date:  02/21/2021 12:24 PM
>>> Subject:  [EXTERNAL] Re: xlf90_r Internal Compiler Error
>>> 
>>> 
>>> 
>>> Hi Roy, I'm not sure which projects at LLNL are using PETSc or if they 
>>> chose to build their own version. Entirely unrelated to our problem, but is 
>>> it possible to find this out? It would be great if yes, but also completely 
>>> fine if not. PETSc 
>>> Hi Roy,
>>> I'm not sure which projects at LLNL are using PETSc or if they chose to 
>>> build their own version.
>>> Entirely unrelated to our problem, but is it possible to find this out? It 
>>> would be great if yes, but also completely fine if not. PETSc is 
>>> potentially undergoing a rather transformative rewrite over the next few 
>>> years and we’d like to gather current usage data to get a better idea of 
>>> where PETSc fits into our users workflows. But we aren’t sure how to gather 
>>> this data (we don’t particularly want to scrape and silently send it off 
>>> without users consent/knowledge) absent user questionnaires and HPC usage 
>>> statistics.
>>> If you are interested, I can share with you the spack recipes I use to 
>>> build petsc with hdf5, hypre, and suplerlu-dist.
>>> Yes that would be quite useful. I can let it percolate through our dev 
>>> channels for any other recommendations etc.
>>> 3.14.0 and 3.14.1
>>> 
>>> "../roymuss/spack-stage-petsc-3.14.0-on3lboy4slkz65tsjttgfmwghzky54jj/spack-src/src/vec/f90-mod/petscvecmod.F90",
>>>  line 9.13: 1514-219 (S) Unable to access module symbol file for module 
>>> petscisdefdummy. Check path and file permissions of file. Use association 
>>> not done for this module.
>>> 1501-511 Compilation failed for file petscvecmod.F90.
>>> How exactly did you switch between versions? PETSc has 2 types of fortran 
>>> bindings, “ftn-custom” and “ftn-auto” (technically 3 including the F90 
>>> files, but those simply call either of the two preceding ones), a copy of 
>>> which you will find in every src directory. As the names imply ftn-auto is 
>>> auto generated while ftn-custom is hand-written. 
>>> 
>>> This also means that the ftn-auto files are __not__ tracked by git, so a 
>>> simple git checkout [new-tag] may not properly dispose of the old 
>>> auto-generated files (very rare, but IIRC we made a major enough change to 
>>> the fortran bindings within the last year to warrant having to "make 
>>> deletefortranstubs" before rebuilding).
>>> Adding the option -qlanglvl=2003std or -qlanglvl=2008std produces a bunch 
>>> of other warning messages, but it still encounters the ICE. So, I'm 
>>> uncertain if the subroutine name length is the root of the problem. 
>>> Our current compiler flag selection philosophy is to require a minimum but 
>>> choose the maximum available reasonable flag for the compiler (I.e. we 
>>> require C99, but very often you will find that your code is compiled with 
>>> C11 or C17 if they are available). It is therefore odd that configure did 
>>> not use the same methodology for fortran compilers. I will relay this on 
>>> our side.
>>> Is it possible for you to use subroutines that are less than 32 characters 
>>> and see if that works four you? Have you used other fortran 90 compilers 
>>> and do any of them complain of this? 
>>> Of all of the small quirks fortran has this is probably the most esoteric 
>>> one I’ve come across… I’ve attached a list of all the F90 compilers, and 
>>> their flags which we use in CI/CD (all of which is run multiple times daily 
>>> and __must__ pass). I got them all via grep, so there may be some 
>>> duplicates here or there. As for using shorter names, this is also 
>>> something we can look at, but since none of the other compilers have had 
>>> issues with this I’m not sure this is the change to make.
>>> Are there any unusual or questionable language constructs used in any of 
>>> the functions mentioned above that may possibly challenge the compiler? 
>>> Not that I am aware of, but again I will ask around our dev channels and 
>>> see if anything comes to mind.
>>> 
>>> 
>>> Best regards,
>>> 
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>> Cell: (312) 694-3391[attachment "compilerList" deleted by Roy 
>>> Musselman/Rochester/Contr/IBM] 
>>> On Feb 20, 2021, at 22:05, Roy Musselman <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>> wrote:
>>> Hi Jacob,
>>> Thanks for letting me know that you are a PETSc developer and that you are 
>>> testing it on the LLNL lassen system. I've used the spack build tool to 
>>> build and deploy a few versions on the systems. I'm not sure which projects 
>>> at LLNL are using PETSc or if they chose to build their own version. I did 
>>> however provide a single precision version upon request that was integrated 
>>> with MVAPICH2-MPI instead of the IBM-provided Spectrum-MPI. Here's what's 
>>> available on the systems today.
>>> 
>>>> ml avail petsc
>>> ----------------------------------------------------- 
>>> /usr/tcetmp/modulefiles/Core 
>>> -----------------------------------------------------
>>> petsc/default petsc/3.10.2 petsc/3.11.3 petsc/3.13.0 (D)  
>>> petsc/3.13.1-mvapich2-2020.01.09-xl-2020.03.18.single
>>> 
>>> If you are interested, I can share with you the spack recipes I use to 
>>> build petsc with hdf5, hypre, and suplerlu-dist.
>>> 
>>> After several attempts I was able to reproduce the Internal Compiler Errro 
>>> (ICE) that you are seeing using version 3.14.4. I've whittled it down to 
>>> the petscmatmod.F90 file and it's specific dependencies. 
>>> The following script is what I'm using. Note that in the 2nd set of 
>>> compiles, the -E option is used to expand all included source files and 
>>> headers and encapsulating it into a single large source file. This can be 
>>> used to help isolate the source of the problem.  
>>> 
>>> #!/bin/bash
>>> 
>>> PETSCDIR="../roymuss/spack-stage-petsc-3.14.4-eh5arny7l3cqjlltlfpjp6f4jofbnmz6/spack-src"
>>>  
>>> OPTIONS=" -qmoddir=moddir -I$PETSCDIR/arch-linux-c-opt/include 
>>> -I$PETSCDIR/include"
>>> mkdir -p moddir
>>> 
>>> set -x 
>>> 
>>> # Compile original source files including dependencies
>>> if [ 0 = 1 ]; then
>>> mpif90 -c -g $OPTIONS $PETSCDIR/src/sys/f90-mod/petscsysmod.F90 -o 
>>> petscsysmod.o 
>>> mpif90 -c -g $OPTIONS $PETSCDIR/src/vec/f90-mod/petscvecmod.F90 -o 
>>> petscvecmod.o
>>> mpif90 -c -g $OPTIONS $PETSCDIR/src/mat/f90-mod/petscmatmod.F90 -o 
>>> petscmatmod.o
>>> fi
>>> 
>>> # Use -E option to expand source into full source files
>>> if [ 0 = 1 ]; then
>>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/sys/f90-mod/petscsysmod.F90 -o 
>>> full_petscsysmod.F90
>>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/vec/f90-mod/petscvecmod.F90 -o 
>>> full_petscvecmod.F90
>>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/mat/f90-mod/petscmatmod.F90 -o 
>>> full_petscmatmod.F90
>>> fi
>>> 
>>> # Compile from full source files
>>> if [ 1 = 1 ]; then
>>> mpif90 -c -g -Imoddir -qmoddir=moddir full_petscsysmod.F90 -o 
>>> full_petscsysmod.o
>>> mpif90 -c -g -Imoddir -qmoddir=moddir full_petscvecmod.F90 -o 
>>> full_petscvecmod.o
>>> mpif90 -V -c -g -Imoddir -qmoddir=moddir full_petscmatmod.F90 -o 
>>> full_petscmatmod.o
>>> fi
>>> 
>>> <eof>
>>> 
>>> Petsc 3.13.6 it the most recent version that did not fail. I tried all 
>>> subsequent versions and got the folowing results: 
>>> 
>>> 3.14.0 and 3.14.1
>>> 
>>> "../roymuss/spack-stage-petsc-3.14.0-on3lboy4slkz65tsjttgfmwghzky54jj/spack-src/src/vec/f90-mod/petscvecmod.F90",
>>>  line 9.13: 1514-219 (S) Unable to access module symbol file for module 
>>> petscisdefdummy. Check path and file permissions of file. Use association 
>>> not done for this module.
>>> 1501-511 Compilation failed for file petscvecmod.F90.
>>> 
>>> 3.14.2, 3.14.3, and 3.14.4
>>> 
>>> . . .
>>> ** matnullspaceequals === End of Compilation 8 ===
>>> *** Error in `/usr/tce/packages/xl/xl-2020.11.12/xlf/16.1.1/exe/xlfentry': 
>>> free(): invalid pointer: 0x0000200001740018 ***
>>> 
>>> Examining the tail end of petscmatmod.F90
>>> 
>>> 
>>> 80 function matnullspaceequals(A,B)
>>> 81 use petscmatdefdummy
>>> 82 logical matnullspaceequals
>>> 83 type(tMatNullSpace), intent(in) :: A,B
>>> 84 matnullspaceequals = (A%v .eq. B%v)
>>> 85 end function
>>> 86 
>>> 87 #if defined(_WIN32) && defined(PETSC_USE_SHARED_LIBRARIES)
>>> 88 !DEC$ ATTRIBUTES DLLEXPORT::matnotequal
>>> 89 !DEC$ ATTRIBUTES DLLEXPORT::matequals
>>> 90 !DEC$ ATTRIBUTES DLLEXPORT::matfdcoloringnotequal
>>> 91 !DEC$ ATTRIBUTES DLLEXPORT::matfdcoloringequals
>>> 92 !DEC$ ATTRIBUTES DLLEXPORT::matnullspacenotequal
>>> 93 !DEC$ ATTRIBUTES DLLEXPORT::matnullspaceequals
>>> 94 #endif
>>> 95 module petscmat
>>> 96 use petscmatdef
>>> 97 use petscvec
>>> 98 #include <../src/mat/f90-mod/petscmat.h90>
>>> 99 interface
>>> 100 #include <../src/mat/f90-mod/ftn-auto-interfaces/petscmat.h90>
>>> 101 end interface
>>> 102 end module
>>> 103 
>>> 
>>> Compiling the matnullspaceequals function was successful just before 
>>> hitting the error. The error goes away when removing either or both of the 
>>> #include lines 98 and 100. Both #include statements are required to produce 
>>> the error. The 3.13.6 and 3.14.4 version of the file identified in the 
>>> first #include at line 98 are identical. The file identified in line 100 is 
>>> different between 3.13.6 and 3.14.4.
>>> Just looking at the list of subroutines contained within each version, the 
>>> following are the differences. 
>>> 
>>> Old subroutines available in 3.13.6 but removed from 4.14.4
>>> subroutine MatFreeIntermediateDataStructures(a,z)
>>> 
>>> New subroutines available in 4.14.4 but not contained in 3.13.6 
>>> subroutine MatDenseReplaceArray(a,b,z)
>>> subroutine MatIsShell(a,b,z)
>>> subroutine MatRARtMultEqual(a,b,c,d,e,z)
>>> subroutine MatScaLAPACKGetBlockSizes(a,b,c,z)
>>> subroutine MatScaLAPACKSetBlockSizes(a,b,c,z)
>>> subroutine MatSeqAIJCUSPARSESetGenerateTranspose(a,b,z)
>>> subroutine MatSeqAIJSetTotalPreallocation(a,b,z)
>>> subroutine MatSetLayouts(a,b,c,z)
>>> 
>>> Methodically removing the new subroutines did not provide a consistent 
>>> result. But I did notice the extra long subroutine name 
>>> MatSeqAIJCUSPARSESetGenerateTranspose had 37 characters.
>>> A little research found: In Fortran 90/95 the maximum length was 31 
>>> characters, in Fortran 2003 it is now 63 characters. I found the following 
>>> subroutines with greater than 31 characters
>>> 
>>> subroutine MatCreateMPIMatConcatenateSeqMat
>>> subroutine MatFactorFactorizeSchurComplement
>>> subroutine MatMPIAdjCreateNonemptySubcommMat
>>> subroutine MatSeqAIJCUSPARSESetGenerateTranspose
>>> subroutine MatMPIAIJSetUseScalableIncreaseOverlap
>>> subroutine MatFactorSolveSchurComplementTranspose
>>> 
>>> I individually ifdef'd them out of the source file and was able to compile 
>>> the files successfully without encountering the ICE. 
>>> 
>>> I'm not exactly sure what the maximum subroutine name length that the XLF 
>>> compiler allows, but if it is only 31, it would be useful if the compiler 
>>> detected this and issue a message instead of the ICE.
>>> Adding the option -qlanglvl=2003std or -qlanglvl=2008std produces a bunch 
>>> of other warning messages, but it still encounters the ICE. So, I'm 
>>> uncertain if the subroutine name length is the root of the problem. 
>>> 
>>> Is it possible for you to use subroutines that are less than 32 characters 
>>> and see if that works four you? Have you used other fortran 90 compilers 
>>> and do any of them complain of this? 
>>> Are there any unusual or questionable language constructs used in any of 
>>> the functions mentioned above that may possibly challenge the compiler? 
>>> 
>>> I'll package this up and send it to the IBM XL compiler development team 
>>> for their examination and comment. 
>>> 
>>> Best Regards,
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Roy Musselman
>>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>>> email: [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>> LLNL office: 925-422-6033
>>> Cell: 507-358-8895, Home: 507-281-9565
>>> 
>>> <graycol.gif>Jacob Faibussowitsch ---02/18/2021 02:17:05 PM---> The most 
>>> recently built version available on the CORAL systems is 3.13.0. (ml load 
>>> petsc/3.13.0) W
>>> 
>>> From:  Jacob Faibussowitsch <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> To:  Roy Musselman <[email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>>
>>> Cc:  "Gyllenhaal, John C." <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> Date:  02/18/2021 02:17 PM
>>> Subject:  [EXTERNAL] Re: xlf90_r Internal Compiler Error
>>> 
>>> 
>>> 
>>> 
>>> 
>>> The most recently built version available on the CORAL systems... 
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>> The most recently built version available on the CORAL systems is 3.13.0. 
>>> (ml load petsc/3.13.0) Will that work for you?
>>> I am building petsc from source as part of development work on petsc itself 
>>> so modules are unfortunately not useful here.
>>> The files you sent me do not contain all the dependencies (other mod files) 
>>> required to reproduce the error. 
>>> I'll attempt to build version 3.14.4 from scratch and recreate the failing 
>>> symptom you are observing.
>>> Yes, petsc uses an automated system to generate the fortran files from C 
>>> which goes about 20 rabbit holes deeper than I was willing to dig. Let me 
>>> know if you run into trouble configuring and building petsc, I can point 
>>> you in the right direction. I’ve attached a “reconfigure” script with this 
>>> email, it contains all of the arguments I used to configure petsc 
>>> successfully on Lassen. If you place it into your $PETSC_DIR (i.e. the 
>>> folder titled “petsc” and that contains a “configure” file) and run:
>>> 
>>> $ python3 ./reconfigure-arch-linux-c-debug.py
>>> 
>>> It should work. If not, you will have to 
>>> 
>>> $ ./configure —all-the-args —in-the-reconfigure —file
>>> 
>>> Best regards,
>>> 
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>> Cell: (312) 694-3391[attachment "reconfigure-arch-linux-c-debug.py" deleted 
>>> by Roy Musselman/Rochester/Contr/IBM] 
>>> On Feb 18, 2021, at 15:07, Roy Musselman <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>> wrote:
>>> Hi Jacob,
>>> 
>>> The source file appears to come from the PETSc 3.14.4 library. The most 
>>> recently built version available on the CORAL systems is 3.13.0. (ml load 
>>> petsc/3.13.0) Will that work for you?
>>> The files you sent me do not contain all the dependencies (other mod files) 
>>> required to reproduce the error. 
>>> I'll attempt to build version 3.14.4 from scratch and recreate the failing 
>>> symptom you are observing.
>>> 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Roy Musselman
>>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>>> email: [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>> LLNL office: 925-422-6033
>>> Cell: 507-358-8895, Home: 507-281-9565
>>> 
>>> <graycol.gif>Roy Musselman---02/18/2021 11:18:20 AM---I'll take a look. 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Roy Musselman
>>> 
>>> From: Roy Musselman/Rochester/Contr/IBM
>>> To: LC Hotline <[email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>>
>>> Cc: "Gyllenhaal, John C." <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> Date: 02/18/2021 11:18 AM
>>> Subject: Re: [EXTERNAL] FW: xlf90_r Internal Compiler Error
>>> 
>>> 
>>> 
>>> 
>>> 
>>> I'll take a look. 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> Roy Musselman
>>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>>> email: [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>> LLNL office: 925-422-6033
>>> Cell: 507-358-8895, Home: 507-281-9565
>>> 
>>> 
>>> <graycol.gif>LC Hotline ---02/18/2021 11:03:55 AM---Hi John, Roy, Can you 
>>> help this user with the problem that he is seeing when he tries to build 
>>> with
>>> 
>>> From: LC Hotline <[email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>>
>>> To: "Gyllenhaal, John C." <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>, Roy Musselman <[email protected] 
>>> <mailto:[email protected]><mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> Date: 02/18/2021 11:03 AM
>>> Subject: [EXTERNAL] FW: xlf90_r Internal Compiler Error
>>> 
>>> 
>>> 
>>> Hi John, Roy, Can you help this user with the problem that he is... 
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>> Hi John, Roy,
>>> 
>>> Can you help this user with the problem that he is seeing when he tries to 
>>> build with xlf90 on Lassen?
>>> 
>>> Thanks,
>>> Ryan
>>> --
>>> LC Hotline
>>> 
>>> From: Jacob Faibussowitsch <[email protected] 
>>> <mailto:[email protected]> <mailto:[email protected] 
>>> <mailto:[email protected]>>>
>>> Date: Wednesday, February 17, 2021 at 5:27 PM
>>> To: LC Hotline <[email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>>
>>> Subject: xlf90_r Internal Compiler Error
>>> 
>>> Hello LC Support, 
>>> 
>>> While compiling my application on Lassen I seem have run afoul of the xlf90 
>>> mpi compiler wrapper with the following error:
>>> 
>>> *** Error in `/usr/tce/packages/xl/xl-2020.11.12/xlf/16.1.1/exe/xlfentry': 
>>> free(): invalid pointer: 0x0000200001740018 ***
>>> 
>>> I’m fairly certain this isn’t my fault as this is code that compiles 
>>> regularly on extensive CI/CD under various other compilers and machines, 
>>> but you can never rule it out. I have included a verbose full log of my 
>>> make run (which includes a comprehensive rundown of the environment) as 
>>> well as a separate file containing the error message and stack trace from 
>>> the compiler. Additionally I have also included the file which I believe is 
>>> causing the error. Let me know if there is anything else I should send.
>>> 
>>> P.S. My list of loaded modules:
>>> 
>>> Currently Loaded Modules:
>>> 1) StdEnv (S) 4) cuda/11.1.1 7) valgrind/3.16.1
>>> 2) clang/ibm-11.0.0 5) python/3.8.2 8) lapack/3.9.0-xl-2020.11.12
>>> 3) spectrum-mpi/rolling-release 6) cmake/3.18.0 9) hip/3.0.0
>>> 
>>> Best regards,
>>> 
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>> Cell: (312) 694-3391[attachment "errorReport.zip" deleted by Roy 
>>> Musselman/Rochester/Contr/IBM] 

Reply via email to