Hi Junchao, I'll test the ex53. At the meantime, I use the following work around: my program call MPI initialize once for entire program PetscInitialize once for entire program SlecpInitialize once for entire program (I think I can skip PetscInitialize above) calling slepc multiple times my program call MPI finalize before ending program
You can see I stkip PetscFinalize/SlepcFinalize. I am uneasy for skipping them since I am not sure what is the consequence. Can you comment on it? Thanks, Sam On Fri, Jun 26, 2020 at 6:58 PM Junchao Zhang <[email protected]> wrote: > Did the test included in that commit fail in your environment? You can > also change the test by adding calls to SlepcInitialize/SlepcFinalize > between PetscInitializeNoPointers/PetscFinalize as in my previous email. > > --Junchao Zhang > > > On Fri, Jun 26, 2020 at 5:54 PM Sam Guo <[email protected]> wrote: > >> Hi Junchao, >> If you are talking about this commit of yours >> https://gitlab.com/petsc/petsc/-/commit/f0463fa09df52ce43e7c5bf47a1c87df0c9e5cbb >> >> Recycle keyvals and fix bugs in MPI_Comm creation >> I think I got it. It fixes the serial one but parallel one is still >> crashing. >> >> Thanks, >> Sam >> >> On Fri, Jun 26, 2020 at 3:43 PM Sam Guo <[email protected]> wrote: >> >>> Hi Junchao, >>> I am not ready to upgrade petsc yet(due to the lengthy technical and >>> legal approval process of our internal policy). Can you send me the diff >>> file so I can apply it to petsc 3.11.3)? >>> >>> Thanks, >>> Sam >>> >>> On Fri, Jun 26, 2020 at 3:33 PM Junchao Zhang <[email protected]> >>> wrote: >>> >>>> Sam, >>>> Please discard the origin patch I sent you. A better fix is already >>>> in maint/master. An test is at src/sys/tests/ex53.c >>>> I modified that test at the end with >>>> >>>> for (i=0; i<500; i++) { >>>> ierr = PetscInitializeNoPointers(argc,argv,NULL,help);if (ierr) >>>> return ierr; >>>> ierr = SlepcInitialize(&argc,&argv,NULL,help);if (ierr) return ierr; >>>> ierr = SlepcFinalize();if (ierr) return ierr; >>>> ierr = PetscFinalize();if (ierr) return ierr; >>>> } >>>> >>>> >>>> then I ran it with multiple mpi ranks and it ran correctly. So try >>>> your program with petsc master first. If not work, see if you can come up >>>> with a test example for us. >>>> >>>> Thanks. >>>> --Junchao Zhang >>>> >>>> >>>> On Fri, Jun 26, 2020 at 3:37 PM Sam Guo <[email protected]> wrote: >>>> >>>>> One work around for me is to call PetscInitialize once for my entire >>>>> program and skip PetscFinalize (since I don't have a good place to call >>>>> PetscFinalize before ending the program). >>>>> >>>>> On Fri, Jun 26, 2020 at 1:33 PM Sam Guo <[email protected]> wrote: >>>>> >>>>>> I get the crash after calling Initialize/Finalize multiple times. >>>>>> Junchao fixed the bug for serial but parallel still crashes. >>>>>> >>>>>> On Fri, Jun 26, 2020 at 1:28 PM Barry Smith <[email protected]> wrote: >>>>>> >>>>>>> >>>>>>> Ah, so you get the crash the second time you call >>>>>>> PetscInitialize()? That is a problem because we do intend to support >>>>>>> that >>>>>>> capability (but you much call PetscFinalize() each time also). >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> >>>>>>> On Jun 26, 2020, at 3:25 PM, Sam Guo <[email protected]> wrote: >>>>>>> >>>>>>> Hi Barry, >>>>>>> Thanks for the quick response. >>>>>>> I will call PetscInitialize once and skip the PetscFinalize for >>>>>>> now to avoid the crash. The crash is actually in PetscInitialize, not >>>>>>> PetscFinalize. >>>>>>> >>>>>>> Thanks, >>>>>>> Sam >>>>>>> >>>>>>> On Fri, Jun 26, 2020 at 1:21 PM Barry Smith <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Sam, >>>>>>>> >>>>>>>> You can skip PetscFinalize() so long as you only call >>>>>>>> PetscInitialize() once. It is not desirable in general to skip the >>>>>>>> finalize >>>>>>>> because PETSc can't free all its data structures and you cannot see the >>>>>>>> PETSc logging information with -log_view but in terms of the code >>>>>>>> running >>>>>>>> correctly you do not need to call PetscFinalize. >>>>>>>> >>>>>>>> If your code crashes in PetscFinalize() please send the full >>>>>>>> error output and we can try to help you debug it. >>>>>>>> >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> On Jun 26, 2020, at 3:14 PM, Sam Guo <[email protected]> wrote: >>>>>>>> >>>>>>>> To clarify, we have a mpi wrapper (so we can switch to different >>>>>>>> mpi at runtime). I compile petsc using our mpi wrapper. >>>>>>>> If I just call PETSc initialize once without calling finallize, it >>>>>>>> is ok. My question to you is that: can I skip finalize? >>>>>>>> Our program calls mpi_finalize at end anyway. >>>>>>>> >>>>>>>> On Fri, Jun 26, 2020 at 1:09 PM Sam Guo <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Junchao, >>>>>>>>> Attached please find the configure.log. >>>>>>>>> I also attach the pinit.c which contains your patch (I am >>>>>>>>> currently using 3.11.3. I've applied your patch to 3.11.3). Your patch >>>>>>>>> fixes the serial version. The error now is about the parallel. >>>>>>>>> Here is the error log: >>>>>>>>> >>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>> [1]PETSC ERROR: #2 checkError() line 56 in >>>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp >>>>>>>>> [1]PETSC ERROR: #3 PetscInitialize() line 966 in >>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>> [1]PETSC ERROR: #4 SlepcInitialize() line 262 in >>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>> [0]PETSC ERROR: #2 checkError() line 56 in >>>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp >>>>>>>>> [0]PETSC ERROR: #3 PetscInitialize() line 966 in >>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>> [0]PETSC ERROR: #4 SlepcInitialize() line 262 in >>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>> >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>>>>>> with errorcode 56. >>>>>>>>> >>>>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>>>>>> You may or may not see output from other processes, depending on >>>>>>>>> exactly when Open MPI kills them. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sam >>>>>>>>> >>>>>>>>> On Thu, Jun 25, 2020 at 7:37 PM Junchao Zhang < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Sam, >>>>>>>>>> The MPI_Comm_create_keyval() error was fixed in maint/master. >>>>>>>>>> From the error message, it seems you need to configure --with-log=1 >>>>>>>>>> Otherwise, please send your full error stack trace and >>>>>>>>>> configure.log. >>>>>>>>>> Thanks. >>>>>>>>>> --Junchao Zhang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Jun 25, 2020 at 2:18 PM Sam Guo <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Junchao, >>>>>>>>>>> I now encountered the same error with parallel. I am >>>>>>>>>>> wondering if there is a need for parallel fix as well. >>>>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>>> >>>>>>>>>>> On Sat, Jun 20, 2020 at 7:35 PM Sam Guo <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Junchao, >>>>>>>>>>>> Your patch works. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Sam >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Jun 20, 2020 at 4:23 PM Junchao Zhang < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:24 PM Barry Smith <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Junchao, >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is a good bug fix. It solves the problem when PETSc >>>>>>>>>>>>>> initialize is called many times. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There is another fix you can do to limit PETSc mpiuni >>>>>>>>>>>>>> running out of attributes inside a single PETSc run: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void >>>>>>>>>>>>>> *extra_state) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> >>>>>>>>>>>>>> if (num_attr >= MAX_ATTR){ >>>>>>>>>>>>>> for (i=0; i<num_attr; i++) { >>>>>>>>>>>>>> if (!attr_keyval[i].extra_state) { >>>>>>>>>>>>>> >>>>>>>>>>>>> attr_keyval[i].extra_state is provided by user (could be >>>>>>>>>>>>> NULL). We can not rely on it. >>>>>>>>>>>>> >>>>>>>>>>>>>> /* reuse this slot */ >>>>>>>>>>>>>> attr_keyval[i].extra_state = extra_state; >>>>>>>>>>>>>> attr_keyval[i.]del = delete_fn; >>>>>>>>>>>>>> *keyval = i; >>>>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> } >>>>>>>>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>>>>>>>> attr_keyval[num_attr].extra_state = extra_state; >>>>>>>>>>>>>> attr_keyval[num_attr].del = delete_fn; >>>>>>>>>>>>>> *keyval = num_attr++; >>>>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> This will work if the user creates tons of attributes but >>>>>>>>>>>>>> is constantly deleting some as they new ones. So long as the >>>>>>>>>>>>>> number >>>>>>>>>>>>>> outstanding at one time is < MAX_ATTR) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't understand what your session means. Let's try this >>>>>>>>>>>>>> patch >>>>>>>>>>>>>> >>>>>>>>>>>>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>> index d559a513..c058265d 100644 >>>>>>>>>>>>>> --- a/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>> +++ b/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void) >>>>>>>>>>>>>> MPI_Comm_free(&comm); >>>>>>>>>>>>>> comm = MPI_COMM_SELF; >>>>>>>>>>>>>> MPI_Comm_free(&comm); >>>>>>>>>>>>>> + num_attr = 1; /* reset the counter */ >>>>>>>>>>>>>> MPI_was_finalized = 1; >>>>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Typo: I mean “Assuming initializer is only needed once for >>>>>>>>>>>>>>> entire session” >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Saturday, June 20, 2020, Sam Guo <[email protected]> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Assuming finalizer is only needed once for entire >>>>>>>>>>>>>>>> session(?), I can put initializer into the static block to >>>>>>>>>>>>>>>> call it once but >>>>>>>>>>>>>>>> where do I call finalizer? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Saturday, June 20, 2020, Junchao Zhang < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The counter num_attr should be recycled. But first try to >>>>>>>>>>>>>>>>> call PETSc initialize/Finalize only once to see it fixes the >>>>>>>>>>>>>>>>> error. >>>>>>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> To clarify, I call PETSc initialize and PETSc finalize >>>>>>>>>>>>>>>>>> everytime I call SLEPc: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> PetscInitializeNoPointers(argc,args,nullptr,nullptr); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> //calling slepc >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> SlepcFinalize(); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> PetscFinalize(); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Dear PETSc team, >>>>>>>>>>>>>>>>>>> When I called SLEPc multiple time, I eventually got >>>>>>>>>>>>>>>>>>> following error: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> MPI operation not supported by PETSc's sequential MPI >>>>>>>>>>>>>>>>>>> wrappers >>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in >>>>>>>>>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in >>>>>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359 >>>>>>>>>>>>>>>>>>> in ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>>>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I debugged: it is because of following in >>>>>>>>>>>>>>>>>>> petsc/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> if (num_attr >= MAX_ATTR) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> in function int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void >>>>>>>>>>>>>>>>>>> *extra_state) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> num_attr is declared static and keeps increasing every >>>>>>>>>>>>>>>>>>> time MPI_Comm_create_keyval is called. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the >>>>>>>>>>>>>>>>>>> same logic. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is this a bug or I didn't use it correctly? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Sam >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>
