Did the test included in that commit fail in your environment? You can also change the test by adding calls to SlepcInitialize/SlepcFinalize between PetscInitializeNoPointers/PetscFinalize as in my previous email.
--Junchao Zhang On Fri, Jun 26, 2020 at 5:54 PM Sam Guo <[email protected]> wrote: > Hi Junchao, > If you are talking about this commit of yours > https://gitlab.com/petsc/petsc/-/commit/f0463fa09df52ce43e7c5bf47a1c87df0c9e5cbb > > Recycle keyvals and fix bugs in MPI_Comm creation > I think I got it. It fixes the serial one but parallel one is still > crashing. > > Thanks, > Sam > > On Fri, Jun 26, 2020 at 3:43 PM Sam Guo <[email protected]> wrote: > >> Hi Junchao, >> I am not ready to upgrade petsc yet(due to the lengthy technical and >> legal approval process of our internal policy). Can you send me the diff >> file so I can apply it to petsc 3.11.3)? >> >> Thanks, >> Sam >> >> On Fri, Jun 26, 2020 at 3:33 PM Junchao Zhang <[email protected]> >> wrote: >> >>> Sam, >>> Please discard the origin patch I sent you. A better fix is already in >>> maint/master. An test is at src/sys/tests/ex53.c >>> I modified that test at the end with >>> >>> for (i=0; i<500; i++) { >>> ierr = PetscInitializeNoPointers(argc,argv,NULL,help);if (ierr) >>> return ierr; >>> ierr = SlepcInitialize(&argc,&argv,NULL,help);if (ierr) return ierr; >>> ierr = SlepcFinalize();if (ierr) return ierr; >>> ierr = PetscFinalize();if (ierr) return ierr; >>> } >>> >>> >>> then I ran it with multiple mpi ranks and it ran correctly. So try your >>> program with petsc master first. If not work, see if you can come up with a >>> test example for us. >>> >>> Thanks. >>> --Junchao Zhang >>> >>> >>> On Fri, Jun 26, 2020 at 3:37 PM Sam Guo <[email protected]> wrote: >>> >>>> One work around for me is to call PetscInitialize once for my entire >>>> program and skip PetscFinalize (since I don't have a good place to call >>>> PetscFinalize before ending the program). >>>> >>>> On Fri, Jun 26, 2020 at 1:33 PM Sam Guo <[email protected]> wrote: >>>> >>>>> I get the crash after calling Initialize/Finalize multiple times. >>>>> Junchao fixed the bug for serial but parallel still crashes. >>>>> >>>>> On Fri, Jun 26, 2020 at 1:28 PM Barry Smith <[email protected]> wrote: >>>>> >>>>>> >>>>>> Ah, so you get the crash the second time you call >>>>>> PetscInitialize()? That is a problem because we do intend to support >>>>>> that >>>>>> capability (but you much call PetscFinalize() each time also). >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Jun 26, 2020, at 3:25 PM, Sam Guo <[email protected]> wrote: >>>>>> >>>>>> Hi Barry, >>>>>> Thanks for the quick response. >>>>>> I will call PetscInitialize once and skip the PetscFinalize for >>>>>> now to avoid the crash. The crash is actually in PetscInitialize, not >>>>>> PetscFinalize. >>>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>>> On Fri, Jun 26, 2020 at 1:21 PM Barry Smith <[email protected]> wrote: >>>>>> >>>>>>> >>>>>>> Sam, >>>>>>> >>>>>>> You can skip PetscFinalize() so long as you only call >>>>>>> PetscInitialize() once. It is not desirable in general to skip the >>>>>>> finalize >>>>>>> because PETSc can't free all its data structures and you cannot see the >>>>>>> PETSc logging information with -log_view but in terms of the code >>>>>>> running >>>>>>> correctly you do not need to call PetscFinalize. >>>>>>> >>>>>>> If your code crashes in PetscFinalize() please send the full >>>>>>> error output and we can try to help you debug it. >>>>>>> >>>>>>> >>>>>>> Barry >>>>>>> >>>>>>> On Jun 26, 2020, at 3:14 PM, Sam Guo <[email protected]> wrote: >>>>>>> >>>>>>> To clarify, we have a mpi wrapper (so we can switch to different mpi >>>>>>> at runtime). I compile petsc using our mpi wrapper. >>>>>>> If I just call PETSc initialize once without calling finallize, it >>>>>>> is ok. My question to you is that: can I skip finalize? >>>>>>> Our program calls mpi_finalize at end anyway. >>>>>>> >>>>>>> On Fri, Jun 26, 2020 at 1:09 PM Sam Guo <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Junchao, >>>>>>>> Attached please find the configure.log. >>>>>>>> I also attach the pinit.c which contains your patch (I am >>>>>>>> currently using 3.11.3. I've applied your patch to 3.11.3). Your patch >>>>>>>> fixes the serial version. The error now is about the parallel. >>>>>>>> Here is the error log: >>>>>>>> >>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>> [1]PETSC ERROR: #2 checkError() line 56 in >>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp >>>>>>>> [1]PETSC ERROR: #3 PetscInitialize() line 966 in >>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>> [1]PETSC ERROR: #4 SlepcInitialize() line 262 in >>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>> [0]PETSC ERROR: #2 checkError() line 56 in >>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp >>>>>>>> [0]PETSC ERROR: #3 PetscInitialize() line 966 in >>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>> [0]PETSC ERROR: #4 SlepcInitialize() line 262 in >>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>>>>> with errorcode 56. >>>>>>>> >>>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>>>>> You may or may not see output from other processes, depending on >>>>>>>> exactly when Open MPI kills them. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sam >>>>>>>> >>>>>>>> On Thu, Jun 25, 2020 at 7:37 PM Junchao Zhang < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Sam, >>>>>>>>> The MPI_Comm_create_keyval() error was fixed in maint/master. >>>>>>>>> From the error message, it seems you need to configure --with-log=1 >>>>>>>>> Otherwise, please send your full error stack trace and >>>>>>>>> configure.log. >>>>>>>>> Thanks. >>>>>>>>> --Junchao Zhang >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jun 25, 2020 at 2:18 PM Sam Guo <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Junchao, >>>>>>>>>> I now encountered the same error with parallel. I am wondering >>>>>>>>>> if there is a need for parallel fix as well. >>>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>> >>>>>>>>>> On Sat, Jun 20, 2020 at 7:35 PM Sam Guo <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Junchao, >>>>>>>>>>> Your patch works. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Sam >>>>>>>>>>> >>>>>>>>>>> On Sat, Jun 20, 2020 at 4:23 PM Junchao Zhang < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:24 PM Barry Smith <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Junchao, >>>>>>>>>>>>> >>>>>>>>>>>>> This is a good bug fix. It solves the problem when PETSc >>>>>>>>>>>>> initialize is called many times. >>>>>>>>>>>>> >>>>>>>>>>>>> There is another fix you can do to limit PETSc mpiuni >>>>>>>>>>>>> running out of attributes inside a single PETSc run: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void >>>>>>>>>>>>> *extra_state) >>>>>>>>>>>>> { >>>>>>>>>>>>> >>>>>>>>>>>>> if (num_attr >= MAX_ATTR){ >>>>>>>>>>>>> for (i=0; i<num_attr; i++) { >>>>>>>>>>>>> if (!attr_keyval[i].extra_state) { >>>>>>>>>>>>> >>>>>>>>>>>> attr_keyval[i].extra_state is provided by user (could be NULL). >>>>>>>>>>>> We can not rely on it. >>>>>>>>>>>> >>>>>>>>>>>>> /* reuse this slot */ >>>>>>>>>>>>> attr_keyval[i].extra_state = extra_state; >>>>>>>>>>>>> attr_keyval[i.]del = delete_fn; >>>>>>>>>>>>> *keyval = i; >>>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>>> } >>>>>>>>>>>>> } >>>>>>>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>>>>>>> } >>>>>>>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>>>>>>> attr_keyval[num_attr].extra_state = extra_state; >>>>>>>>>>>>> attr_keyval[num_attr].del = delete_fn; >>>>>>>>>>>>> *keyval = num_attr++; >>>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> This will work if the user creates tons of attributes but is >>>>>>>>>>>>> constantly deleting some as they new ones. So long as the number >>>>>>>>>>>>> outstanding at one time is < MAX_ATTR) >>>>>>>>>>>>> >>>>>>>>>>>>> Barry >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I don't understand what your session means. Let's try this >>>>>>>>>>>>> patch >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c >>>>>>>>>>>>> index d559a513..c058265d 100644 >>>>>>>>>>>>> --- a/src/sys/mpiuni/mpi.c >>>>>>>>>>>>> +++ b/src/sys/mpiuni/mpi.c >>>>>>>>>>>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void) >>>>>>>>>>>>> MPI_Comm_free(&comm); >>>>>>>>>>>>> comm = MPI_COMM_SELF; >>>>>>>>>>>>> MPI_Comm_free(&comm); >>>>>>>>>>>>> + num_attr = 1; /* reset the counter */ >>>>>>>>>>>>> MPI_was_finalized = 1; >>>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Typo: I mean “Assuming initializer is only needed once for >>>>>>>>>>>>>> entire session” >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Saturday, June 20, 2020, Sam Guo <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Assuming finalizer is only needed once for entire >>>>>>>>>>>>>>> session(?), I can put initializer into the static block to call >>>>>>>>>>>>>>> it once but >>>>>>>>>>>>>>> where do I call finalizer? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Saturday, June 20, 2020, Junchao Zhang < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The counter num_attr should be recycled. But first try to >>>>>>>>>>>>>>>> call PETSc initialize/Finalize only once to see it fixes the >>>>>>>>>>>>>>>> error. >>>>>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To clarify, I call PETSc initialize and PETSc finalize >>>>>>>>>>>>>>>>> everytime I call SLEPc: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> PetscInitializeNoPointers(argc,args,nullptr,nullptr); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> //calling slepc >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> SlepcFinalize(); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> PetscFinalize(); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dear PETSc team, >>>>>>>>>>>>>>>>>> When I called SLEPc multiple time, I eventually got >>>>>>>>>>>>>>>>>> following error: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> MPI operation not supported by PETSc's sequential MPI >>>>>>>>>>>>>>>>>> wrappers >>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in >>>>>>>>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in >>>>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359 >>>>>>>>>>>>>>>>>> in ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I debugged: it is because of following in >>>>>>>>>>>>>>>>>> petsc/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> if (num_attr >= MAX_ATTR) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> in function int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void >>>>>>>>>>>>>>>>>> *extra_state) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> num_attr is declared static and keeps increasing every >>>>>>>>>>>>>>>>>> time MPI_Comm_create_keyval is called. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the >>>>>>>>>>>>>>>>>> same logic. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Is this a bug or I didn't use it correctly? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Sam >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>>>
