Hi Junchao, I am not ready to upgrade petsc yet(due to the lengthy technical and legal approval process of our internal policy). Can you send me the diff file so I can apply it to petsc 3.11.3)?
Thanks, Sam On Fri, Jun 26, 2020 at 3:33 PM Junchao Zhang <[email protected]> wrote: > Sam, > Please discard the origin patch I sent you. A better fix is already in > maint/master. An test is at src/sys/tests/ex53.c > I modified that test at the end with > > for (i=0; i<500; i++) { > ierr = PetscInitializeNoPointers(argc,argv,NULL,help);if (ierr) return > ierr; > ierr = SlepcInitialize(&argc,&argv,NULL,help);if (ierr) return ierr; > ierr = SlepcFinalize();if (ierr) return ierr; > ierr = PetscFinalize();if (ierr) return ierr; > } > > > then I ran it with multiple mpi ranks and it ran correctly. So try your > program with petsc master first. If not work, see if you can come up with a > test example for us. > > Thanks. > --Junchao Zhang > > > On Fri, Jun 26, 2020 at 3:37 PM Sam Guo <[email protected]> wrote: > >> One work around for me is to call PetscInitialize once for my entire >> program and skip PetscFinalize (since I don't have a good place to call >> PetscFinalize before ending the program). >> >> On Fri, Jun 26, 2020 at 1:33 PM Sam Guo <[email protected]> wrote: >> >>> I get the crash after calling Initialize/Finalize multiple times. >>> Junchao fixed the bug for serial but parallel still crashes. >>> >>> On Fri, Jun 26, 2020 at 1:28 PM Barry Smith <[email protected]> wrote: >>> >>>> >>>> Ah, so you get the crash the second time you call PetscInitialize()? >>>> That is a problem because we do intend to support that capability (but you >>>> much call PetscFinalize() each time also). >>>> >>>> Barry >>>> >>>> >>>> On Jun 26, 2020, at 3:25 PM, Sam Guo <[email protected]> wrote: >>>> >>>> Hi Barry, >>>> Thanks for the quick response. >>>> I will call PetscInitialize once and skip the PetscFinalize for now >>>> to avoid the crash. The crash is actually in PetscInitialize, not >>>> PetscFinalize. >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Fri, Jun 26, 2020 at 1:21 PM Barry Smith <[email protected]> wrote: >>>> >>>>> >>>>> Sam, >>>>> >>>>> You can skip PetscFinalize() so long as you only call >>>>> PetscInitialize() once. It is not desirable in general to skip the >>>>> finalize >>>>> because PETSc can't free all its data structures and you cannot see the >>>>> PETSc logging information with -log_view but in terms of the code running >>>>> correctly you do not need to call PetscFinalize. >>>>> >>>>> If your code crashes in PetscFinalize() please send the full error >>>>> output and we can try to help you debug it. >>>>> >>>>> >>>>> Barry >>>>> >>>>> On Jun 26, 2020, at 3:14 PM, Sam Guo <[email protected]> wrote: >>>>> >>>>> To clarify, we have a mpi wrapper (so we can switch to different mpi >>>>> at runtime). I compile petsc using our mpi wrapper. >>>>> If I just call PETSc initialize once without calling finallize, it is >>>>> ok. My question to you is that: can I skip finalize? >>>>> Our program calls mpi_finalize at end anyway. >>>>> >>>>> On Fri, Jun 26, 2020 at 1:09 PM Sam Guo <[email protected]> wrote: >>>>> >>>>>> Hi Junchao, >>>>>> Attached please find the configure.log. >>>>>> I also attach the pinit.c which contains your patch (I am >>>>>> currently using 3.11.3. I've applied your patch to 3.11.3). Your patch >>>>>> fixes the serial version. The error now is about the parallel. >>>>>> Here is the error log: >>>>>> >>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>> [1]PETSC ERROR: #2 checkError() line 56 in >>>>>> ../../../physics/src/eigensolver/SLEPc.cpp >>>>>> [1]PETSC ERROR: #3 PetscInitialize() line 966 in >>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>> [1]PETSC ERROR: #4 SlepcInitialize() line 262 in >>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>> [0]PETSC ERROR: #2 checkError() line 56 in >>>>>> ../../../physics/src/eigensolver/SLEPc.cpp >>>>>> [0]PETSC ERROR: #3 PetscInitialize() line 966 in >>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>> [0]PETSC ERROR: #4 SlepcInitialize() line 262 in >>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>> PETSC ERROR: Logging has not been enabled. >>>>>> You might have forgotten to call PetscInitialize(). >>>>>> PETSC ERROR: Logging has not been enabled. >>>>>> You might have forgotten to call PetscInitialize(). >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>>> with errorcode 56. >>>>>> >>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. >>>>>> You may or may not see output from other processes, depending on >>>>>> exactly when Open MPI kills them. >>>>>> >>>>>> Thanks, >>>>>> Sam >>>>>> >>>>>> On Thu, Jun 25, 2020 at 7:37 PM Junchao Zhang < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Sam, >>>>>>> The MPI_Comm_create_keyval() error was fixed in maint/master. >>>>>>> From the error message, it seems you need to configure --with-log=1 >>>>>>> Otherwise, please send your full error stack trace and >>>>>>> configure.log. >>>>>>> Thanks. >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 25, 2020 at 2:18 PM Sam Guo <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Junchao, >>>>>>>> I now encountered the same error with parallel. I am wondering >>>>>>>> if there is a need for parallel fix as well. >>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>> >>>>>>>> On Sat, Jun 20, 2020 at 7:35 PM Sam Guo <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Junchao, >>>>>>>>> Your patch works. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sam >>>>>>>>> >>>>>>>>> On Sat, Jun 20, 2020 at 4:23 PM Junchao Zhang < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Jun 20, 2020 at 12:24 PM Barry Smith <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Junchao, >>>>>>>>>>> >>>>>>>>>>> This is a good bug fix. It solves the problem when PETSc >>>>>>>>>>> initialize is called many times. >>>>>>>>>>> >>>>>>>>>>> There is another fix you can do to limit PETSc mpiuni >>>>>>>>>>> running out of attributes inside a single PETSc run: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void >>>>>>>>>>> *extra_state) >>>>>>>>>>> { >>>>>>>>>>> >>>>>>>>>>> if (num_attr >= MAX_ATTR){ >>>>>>>>>>> for (i=0; i<num_attr; i++) { >>>>>>>>>>> if (!attr_keyval[i].extra_state) { >>>>>>>>>>> >>>>>>>>>> attr_keyval[i].extra_state is provided by user (could be NULL). >>>>>>>>>> We can not rely on it. >>>>>>>>>> >>>>>>>>>>> /* reuse this slot */ >>>>>>>>>>> attr_keyval[i].extra_state = extra_state; >>>>>>>>>>> attr_keyval[i.]del = delete_fn; >>>>>>>>>>> *keyval = i; >>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>>>>> } >>>>>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>>>>> attr_keyval[num_attr].extra_state = extra_state; >>>>>>>>>>> attr_keyval[num_attr].del = delete_fn; >>>>>>>>>>> *keyval = num_attr++; >>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> This will work if the user creates tons of attributes but is >>>>>>>>>>> constantly deleting some as they new ones. So long as the number >>>>>>>>>>> outstanding at one time is < MAX_ATTR) >>>>>>>>>>> >>>>>>>>>>> Barry >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> I don't understand what your session means. Let's try this patch >>>>>>>>>>> >>>>>>>>>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c >>>>>>>>>>> index d559a513..c058265d 100644 >>>>>>>>>>> --- a/src/sys/mpiuni/mpi.c >>>>>>>>>>> +++ b/src/sys/mpiuni/mpi.c >>>>>>>>>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void) >>>>>>>>>>> MPI_Comm_free(&comm); >>>>>>>>>>> comm = MPI_COMM_SELF; >>>>>>>>>>> MPI_Comm_free(&comm); >>>>>>>>>>> + num_attr = 1; /* reset the counter */ >>>>>>>>>>> MPI_was_finalized = 1; >>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> --Junchao Zhang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Typo: I mean “Assuming initializer is only needed once for >>>>>>>>>>>> entire session” >>>>>>>>>>>> >>>>>>>>>>>> On Saturday, June 20, 2020, Sam Guo <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Assuming finalizer is only needed once for entire session(?), >>>>>>>>>>>>> I can put initializer into the static block to call it once but >>>>>>>>>>>>> where do I >>>>>>>>>>>>> call finalizer? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Saturday, June 20, 2020, Junchao Zhang < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The counter num_attr should be recycled. But first try to >>>>>>>>>>>>>> call PETSc initialize/Finalize only once to see it fixes the >>>>>>>>>>>>>> error. >>>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> To clarify, I call PETSc initialize and PETSc finalize >>>>>>>>>>>>>>> everytime I call SLEPc: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PetscInitializeNoPointers(argc,args,nullptr,nullptr); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> //calling slepc >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> SlepcFinalize(); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PetscFinalize(); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dear PETSc team, >>>>>>>>>>>>>>>> When I called SLEPc multiple time, I eventually got >>>>>>>>>>>>>>>> following error: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> MPI operation not supported by PETSc's sequential MPI >>>>>>>>>>>>>>>> wrappers >>>>>>>>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in >>>>>>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in >>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359 in >>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I debugged: it is because of following in >>>>>>>>>>>>>>>> petsc/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> if (num_attr >= MAX_ATTR) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> in function int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void >>>>>>>>>>>>>>>> *extra_state) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> num_attr is declared static and keeps increasing every >>>>>>>>>>>>>>>> time MPI_Comm_create_keyval is called. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the same logic. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is this a bug or I didn't use it correctly? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Sam >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> >>>>
