I get the crash after calling Initialize/Finalize multiple times. Junchao fixed the bug for serial but parallel still crashes.
On Fri, Jun 26, 2020 at 1:28 PM Barry Smith <[email protected]> wrote: > > Ah, so you get the crash the second time you call PetscInitialize()? > That is a problem because we do intend to support that capability (but you > much call PetscFinalize() each time also). > > Barry > > > On Jun 26, 2020, at 3:25 PM, Sam Guo <[email protected]> wrote: > > Hi Barry, > Thanks for the quick response. > I will call PetscInitialize once and skip the PetscFinalize for now to > avoid the crash. The crash is actually in PetscInitialize, not > PetscFinalize. > > Thanks, > Sam > > On Fri, Jun 26, 2020 at 1:21 PM Barry Smith <[email protected]> wrote: > >> >> Sam, >> >> You can skip PetscFinalize() so long as you only call PetscInitialize() >> once. It is not desirable in general to skip the finalize because PETSc >> can't free all its data structures and you cannot see the PETSc logging >> information with -log_view but in terms of the code running correctly you >> do not need to call PetscFinalize. >> >> If your code crashes in PetscFinalize() please send the full error >> output and we can try to help you debug it. >> >> >> Barry >> >> On Jun 26, 2020, at 3:14 PM, Sam Guo <[email protected]> wrote: >> >> To clarify, we have a mpi wrapper (so we can switch to different mpi at >> runtime). I compile petsc using our mpi wrapper. >> If I just call PETSc initialize once without calling finallize, it is ok. >> My question to you is that: can I skip finalize? >> Our program calls mpi_finalize at end anyway. >> >> On Fri, Jun 26, 2020 at 1:09 PM Sam Guo <[email protected]> wrote: >> >>> Hi Junchao, >>> Attached please find the configure.log. >>> I also attach the pinit.c which contains your patch (I am currently >>> using 3.11.3. I've applied your patch to 3.11.3). Your patch fixes the >>> serial version. The error now is about the parallel. >>> Here is the error log: >>> >>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>> ../../../petsc/src/sys/objects/pinit.c >>> [1]PETSC ERROR: #2 checkError() line 56 in >>> ../../../physics/src/eigensolver/SLEPc.cpp >>> [1]PETSC ERROR: #3 PetscInitialize() line 966 in >>> ../../../petsc/src/sys/objects/pinit.c >>> [1]PETSC ERROR: #4 SlepcInitialize() line 262 in >>> ../../../slepc/src/sys/slepcinit.c >>> [0]PETSC ERROR: #1 PetscInitialize() line 969 in >>> ../../../petsc/src/sys/objects/pinit.c >>> [0]PETSC ERROR: #2 checkError() line 56 in >>> ../../../physics/src/eigensolver/SLEPc.cpp >>> [0]PETSC ERROR: #3 PetscInitialize() line 966 in >>> ../../../petsc/src/sys/objects/pinit.c >>> [0]PETSC ERROR: #4 SlepcInitialize() line 262 in >>> ../../../slepc/src/sys/slepcinit.c >>> PETSC ERROR: Logging has not been enabled. >>> You might have forgotten to call PetscInitialize(). >>> PETSC ERROR: Logging has not been enabled. >>> You might have forgotten to call PetscInitialize(). >>> >>> -------------------------------------------------------------------------- >>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>> with errorcode 56. >>> >>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>> You may or may not see output from other processes, depending on >>> exactly when Open MPI kills them. >>> >>> Thanks, >>> Sam >>> >>> On Thu, Jun 25, 2020 at 7:37 PM Junchao Zhang <[email protected]> >>> wrote: >>> >>>> Sam, >>>> The MPI_Comm_create_keyval() error was fixed in maint/master. From >>>> the error message, it seems you need to configure --with-log=1 >>>> Otherwise, please send your full error stack trace and configure.log. >>>> Thanks. >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Jun 25, 2020 at 2:18 PM Sam Guo <[email protected]> wrote: >>>> >>>>> Hi Junchao, >>>>> I now encountered the same error with parallel. I am wondering if >>>>> there is a need for parallel fix as well. >>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>> ../../../petsc/src/sys/objects/pinit.c >>>>> PETSC ERROR: Logging has not been enabled. >>>>> You might have forgotten to call PetscInitialize(). >>>>> PETSC ERROR: Logging has not been enabled. >>>>> You might have forgotten to call PetscInitialize(). >>>>> >>>>> On Sat, Jun 20, 2020 at 7:35 PM Sam Guo <[email protected]> wrote: >>>>> >>>>>> Hi Junchao, >>>>>> Your patch works. >>>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>>> On Sat, Jun 20, 2020 at 4:23 PM Junchao Zhang < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Jun 20, 2020 at 12:24 PM Barry Smith <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Junchao, >>>>>>>> >>>>>>>> This is a good bug fix. It solves the problem when PETSc >>>>>>>> initialize is called many times. >>>>>>>> >>>>>>>> There is another fix you can do to limit PETSc mpiuni running >>>>>>>> out of attributes inside a single PETSc run: >>>>>>>> >>>>>>>> >>>>>>>> int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state) >>>>>>>> { >>>>>>>> >>>>>>>> if (num_attr >= MAX_ATTR){ >>>>>>>> for (i=0; i<num_attr; i++) { >>>>>>>> if (!attr_keyval[i].extra_state) { >>>>>>>> >>>>>>> attr_keyval[i].extra_state is provided by user (could be NULL). We >>>>>>> can not rely on it. >>>>>>> >>>>>>>> /* reuse this slot */ >>>>>>>> attr_keyval[i].extra_state = extra_state; >>>>>>>> attr_keyval[i.]del = delete_fn; >>>>>>>> *keyval = i; >>>>>>>> return MPI_SUCCESS; >>>>>>>> } >>>>>>>> } >>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>> } >>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>> attr_keyval[num_attr].extra_state = extra_state; >>>>>>>> attr_keyval[num_attr].del = delete_fn; >>>>>>>> *keyval = num_attr++; >>>>>>>> return MPI_SUCCESS; >>>>>>>> } >>>>>>>> >>>>>>>> This will work if the user creates tons of attributes but is >>>>>>>> constantly deleting some as they new ones. So long as the number >>>>>>>> outstanding at one time is < MAX_ATTR) >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>> I don't understand what your session means. Let's try this patch >>>>>>>> >>>>>>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c >>>>>>>> index d559a513..c058265d 100644 >>>>>>>> --- a/src/sys/mpiuni/mpi.c >>>>>>>> +++ b/src/sys/mpiuni/mpi.c >>>>>>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void) >>>>>>>> MPI_Comm_free(&comm); >>>>>>>> comm = MPI_COMM_SELF; >>>>>>>> MPI_Comm_free(&comm); >>>>>>>> + num_attr = 1; /* reset the counter */ >>>>>>>> MPI_was_finalized = 1; >>>>>>>> return MPI_SUCCESS; >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Typo: I mean “Assuming initializer is only needed once for entire >>>>>>>>> session” >>>>>>>>> >>>>>>>>> On Saturday, June 20, 2020, Sam Guo <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> Assuming finalizer is only needed once for entire session(?), I >>>>>>>>>> can put initializer into the static block to call it once but where >>>>>>>>>> do I >>>>>>>>>> call finalizer? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Saturday, June 20, 2020, Junchao Zhang < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> The counter num_attr should be recycled. But first try to call >>>>>>>>>>> PETSc initialize/Finalize only once to see it fixes the error. >>>>>>>>>>> --Junchao Zhang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> To clarify, I call PETSc initialize and PETSc finalize >>>>>>>>>>>> everytime I call SLEPc: >>>>>>>>>>>> >>>>>>>>>>>> PetscInitializeNoPointers(argc,args,nullptr,nullptr); >>>>>>>>>>>> >>>>>>>>>>>> SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help); >>>>>>>>>>>> >>>>>>>>>>>> //calling slepc >>>>>>>>>>>> >>>>>>>>>>>> SlepcFinalize(); >>>>>>>>>>>> >>>>>>>>>>>> PetscFinalize(); >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Dear PETSc team, >>>>>>>>>>>>> When I called SLEPc multiple time, I eventually got >>>>>>>>>>>>> following error: >>>>>>>>>>>>> >>>>>>>>>>>>> MPI operation not supported by PETSc's sequential MPI wrappers >>>>>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in >>>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in >>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359 in >>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>>>>> >>>>>>>>>>>>> I debugged: it is because of following in >>>>>>>>>>>>> petsc/src/sys/mpiuni/mpi.c >>>>>>>>>>>>> >>>>>>>>>>>>> if (num_attr >= MAX_ATTR) >>>>>>>>>>>>> >>>>>>>>>>>>> in function int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void >>>>>>>>>>>>> *extra_state) >>>>>>>>>>>>> >>>>>>>>>>>>> num_attr is declared static and keeps increasing every >>>>>>>>>>>>> time MPI_Comm_create_keyval is called. >>>>>>>>>>>>> >>>>>>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the same logic. >>>>>>>>>>>>> >>>>>>>>>>>>> Is this a bug or I didn't use it correctly? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Sam >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>> >> >
