On Mon, Jun 29, 2020 at 1:00 PM Sam Guo <[email protected]> wrote:
> Hi Junchao, > I'll test the ex53. At the meantime, I use the following work around: > my program call MPI initialize once for entire program > PetscInitialize once for entire program > SlecpInitialize once for entire program (I think I can skip > PetscInitialize above) > calling slepc multiple times > my program call MPI finalize before ending program > > You can see I stkip PetscFinalize/SlepcFinalize. I am uneasy for > skipping them since I am not sure what is the consequence. Can you comment > on it? > It should be fine. MPI_Finalize does not free objects created by MPI. But since you end your program after MPI_Finalize, there should be no memory leaks. In general, one needs to call PetscFinalize/SlepcFinalize. Try to get a minimal working example and then we can have a look. > > Thanks, > Sam > > > > On Fri, Jun 26, 2020 at 6:58 PM Junchao Zhang <[email protected]> > wrote: > >> Did the test included in that commit fail in your environment? You can >> also change the test by adding calls to SlepcInitialize/SlepcFinalize >> between PetscInitializeNoPointers/PetscFinalize as in my previous email. >> >> --Junchao Zhang >> >> >> On Fri, Jun 26, 2020 at 5:54 PM Sam Guo <[email protected]> wrote: >> >>> Hi Junchao, >>> If you are talking about this commit of yours >>> https://gitlab.com/petsc/petsc/-/commit/f0463fa09df52ce43e7c5bf47a1c87df0c9e5cbb >>> >>> Recycle keyvals and fix bugs in MPI_Comm creation >>> I think I got it. It fixes the serial one but parallel one is still >>> crashing. >>> >>> Thanks, >>> Sam >>> >>> On Fri, Jun 26, 2020 at 3:43 PM Sam Guo <[email protected]> wrote: >>> >>>> Hi Junchao, >>>> I am not ready to upgrade petsc yet(due to the lengthy technical and >>>> legal approval process of our internal policy). Can you send me the diff >>>> file so I can apply it to petsc 3.11.3)? >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Fri, Jun 26, 2020 at 3:33 PM Junchao Zhang <[email protected]> >>>> wrote: >>>> >>>>> Sam, >>>>> Please discard the origin patch I sent you. A better fix is already >>>>> in maint/master. An test is at src/sys/tests/ex53.c >>>>> I modified that test at the end with >>>>> >>>>> for (i=0; i<500; i++) { >>>>> ierr = PetscInitializeNoPointers(argc,argv,NULL,help);if (ierr) >>>>> return ierr; >>>>> ierr = SlepcInitialize(&argc,&argv,NULL,help);if (ierr) return >>>>> ierr; >>>>> ierr = SlepcFinalize();if (ierr) return ierr; >>>>> ierr = PetscFinalize();if (ierr) return ierr; >>>>> } >>>>> >>>>> >>>>> then I ran it with multiple mpi ranks and it ran correctly. So try >>>>> your program with petsc master first. If not work, see if you can come up >>>>> with a test example for us. >>>>> >>>>> Thanks. >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Fri, Jun 26, 2020 at 3:37 PM Sam Guo <[email protected]> wrote: >>>>> >>>>>> One work around for me is to call PetscInitialize once for my entire >>>>>> program and skip PetscFinalize (since I don't have a good place to call >>>>>> PetscFinalize before ending the program). >>>>>> >>>>>> On Fri, Jun 26, 2020 at 1:33 PM Sam Guo <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I get the crash after calling Initialize/Finalize multiple times. >>>>>>> Junchao fixed the bug for serial but parallel still crashes. >>>>>>> >>>>>>> On Fri, Jun 26, 2020 at 1:28 PM Barry Smith <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Ah, so you get the crash the second time you call >>>>>>>> PetscInitialize()? That is a problem because we do intend to support >>>>>>>> that >>>>>>>> capability (but you much call PetscFinalize() each time also). >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> On Jun 26, 2020, at 3:25 PM, Sam Guo <[email protected]> wrote: >>>>>>>> >>>>>>>> Hi Barry, >>>>>>>> Thanks for the quick response. >>>>>>>> I will call PetscInitialize once and skip the PetscFinalize for >>>>>>>> now to avoid the crash. The crash is actually in PetscInitialize, not >>>>>>>> PetscFinalize. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sam >>>>>>>> >>>>>>>> On Fri, Jun 26, 2020 at 1:21 PM Barry Smith <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Sam, >>>>>>>>> >>>>>>>>> You can skip PetscFinalize() so long as you only call >>>>>>>>> PetscInitialize() once. It is not desirable in general to skip the >>>>>>>>> finalize >>>>>>>>> because PETSc can't free all its data structures and you cannot see >>>>>>>>> the >>>>>>>>> PETSc logging information with -log_view but in terms of the code >>>>>>>>> running >>>>>>>>> correctly you do not need to call PetscFinalize. >>>>>>>>> >>>>>>>>> If your code crashes in PetscFinalize() please send the full >>>>>>>>> error output and we can try to help you debug it. >>>>>>>>> >>>>>>>>> >>>>>>>>> Barry >>>>>>>>> >>>>>>>>> On Jun 26, 2020, at 3:14 PM, Sam Guo <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> To clarify, we have a mpi wrapper (so we can switch to different >>>>>>>>> mpi at runtime). I compile petsc using our mpi wrapper. >>>>>>>>> If I just call PETSc initialize once without calling finallize, it >>>>>>>>> is ok. My question to you is that: can I skip finalize? >>>>>>>>> Our program calls mpi_finalize at end anyway. >>>>>>>>> >>>>>>>>> On Fri, Jun 26, 2020 at 1:09 PM Sam Guo <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Junchao, >>>>>>>>>> Attached please find the configure.log. >>>>>>>>>> I also attach the pinit.c which contains your patch (I am >>>>>>>>>> currently using 3.11.3. I've applied your patch to 3.11.3). Your >>>>>>>>>> patch >>>>>>>>>> fixes the serial version. The error now is about the parallel. >>>>>>>>>> Here is the error log: >>>>>>>>>> >>>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>> [1]PETSC ERROR: #2 checkError() line 56 in >>>>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp >>>>>>>>>> [1]PETSC ERROR: #3 PetscInitialize() line 966 in >>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>> [1]PETSC ERROR: #4 SlepcInitialize() line 262 in >>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>> [0]PETSC ERROR: #2 checkError() line 56 in >>>>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp >>>>>>>>>> [0]PETSC ERROR: #3 PetscInitialize() line 966 in >>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>> [0]PETSC ERROR: #4 SlepcInitialize() line 262 in >>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>> >>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD >>>>>>>>>> with errorcode 56. >>>>>>>>>> >>>>>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI >>>>>>>>>> processes. >>>>>>>>>> You may or may not see output from other processes, depending on >>>>>>>>>> exactly when Open MPI kills them. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sam >>>>>>>>>> >>>>>>>>>> On Thu, Jun 25, 2020 at 7:37 PM Junchao Zhang < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Sam, >>>>>>>>>>> The MPI_Comm_create_keyval() error was fixed in maint/master. >>>>>>>>>>> From the error message, it seems you need to configure --with-log=1 >>>>>>>>>>> Otherwise, please send your full error stack trace and >>>>>>>>>>> configure.log. >>>>>>>>>>> Thanks. >>>>>>>>>>> --Junchao Zhang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 25, 2020 at 2:18 PM Sam Guo <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Junchao, >>>>>>>>>>>> I now encountered the same error with parallel. I am >>>>>>>>>>>> wondering if there is a need for parallel fix as well. >>>>>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in >>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Jun 20, 2020 at 7:35 PM Sam Guo <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Junchao, >>>>>>>>>>>>> Your patch works. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Sam >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Jun 20, 2020 at 4:23 PM Junchao Zhang < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:24 PM Barry Smith < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Junchao, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a good bug fix. It solves the problem when >>>>>>>>>>>>>>> PETSc initialize is called many times. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There is another fix you can do to limit PETSc mpiuni >>>>>>>>>>>>>>> running out of attributes inside a single PETSc run: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> int MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void >>>>>>>>>>>>>>> *extra_state) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (num_attr >= MAX_ATTR){ >>>>>>>>>>>>>>> for (i=0; i<num_attr; i++) { >>>>>>>>>>>>>>> if (!attr_keyval[i].extra_state) { >>>>>>>>>>>>>>> >>>>>>>>>>>>>> attr_keyval[i].extra_state is provided by user (could be >>>>>>>>>>>>>> NULL). We can not rely on it. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> /* reuse this slot */ >>>>>>>>>>>>>>> attr_keyval[i].extra_state = extra_state; >>>>>>>>>>>>>>> attr_keyval[i.]del = delete_fn; >>>>>>>>>>>>>>> *keyval = i; >>>>>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> return MPIUni_Abort(MPI_COMM_WORLD,1); >>>>>>>>>>>>>>> attr_keyval[num_attr].extra_state = extra_state; >>>>>>>>>>>>>>> attr_keyval[num_attr].del = delete_fn; >>>>>>>>>>>>>>> *keyval = num_attr++; >>>>>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This will work if the user creates tons of attributes but >>>>>>>>>>>>>>> is constantly deleting some as they new ones. So long as the >>>>>>>>>>>>>>> number >>>>>>>>>>>>>>> outstanding at one time is < MAX_ATTR) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I don't understand what your session means. Let's try this >>>>>>>>>>>>>>> patch >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>>> index d559a513..c058265d 100644 >>>>>>>>>>>>>>> --- a/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>>> +++ b/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void) >>>>>>>>>>>>>>> MPI_Comm_free(&comm); >>>>>>>>>>>>>>> comm = MPI_COMM_SELF; >>>>>>>>>>>>>>> MPI_Comm_free(&comm); >>>>>>>>>>>>>>> + num_attr = 1; /* reset the counter */ >>>>>>>>>>>>>>> MPI_was_finalized = 1; >>>>>>>>>>>>>>> return MPI_SUCCESS; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Typo: I mean “Assuming initializer is only needed once for >>>>>>>>>>>>>>>> entire session” >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Saturday, June 20, 2020, Sam Guo <[email protected]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Assuming finalizer is only needed once for entire >>>>>>>>>>>>>>>>> session(?), I can put initializer into the static block to >>>>>>>>>>>>>>>>> call it once but >>>>>>>>>>>>>>>>> where do I call finalizer? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Saturday, June 20, 2020, Junchao Zhang < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The counter num_attr should be recycled. But first try to >>>>>>>>>>>>>>>>>> call PETSc initialize/Finalize only once to see it fixes the >>>>>>>>>>>>>>>>>> error. >>>>>>>>>>>>>>>>>> --Junchao Zhang >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> To clarify, I call PETSc initialize and PETSc finalize >>>>>>>>>>>>>>>>>>> everytime I call SLEPc: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> PetscInitializeNoPointers(argc,args,nullptr,nullptr); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> //calling slepc >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> SlepcFinalize(); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> PetscFinalize(); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Dear PETSc team, >>>>>>>>>>>>>>>>>>>> When I called SLEPc multiple time, I eventually got >>>>>>>>>>>>>>>>>>>> following error: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> MPI operation not supported by PETSc's sequential MPI >>>>>>>>>>>>>>>>>>>> wrappers >>>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in >>>>>>>>>>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c >>>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in >>>>>>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359 >>>>>>>>>>>>>>>>>>>> in ../../../slepc/src/sys/slepcinit.c >>>>>>>>>>>>>>>>>>>> PETSC ERROR: Logging has not been enabled. >>>>>>>>>>>>>>>>>>>> You might have forgotten to call PetscInitialize(). >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I debugged: it is because of following in >>>>>>>>>>>>>>>>>>>> petsc/src/sys/mpiuni/mpi.c >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> if (num_attr >= MAX_ATTR) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> in function int >>>>>>>>>>>>>>>>>>>> MPI_Comm_create_keyval(MPI_Copy_function >>>>>>>>>>>>>>>>>>>> *copy_fn,MPI_Delete_function >>>>>>>>>>>>>>>>>>>> *delete_fn,int *keyval,void *extra_state) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> num_attr is declared static and keeps increasing every >>>>>>>>>>>>>>>>>>>> time MPI_Comm_create_keyval is called. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the >>>>>>>>>>>>>>>>>>>> same logic. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Is this a bug or I didn't use it correctly? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> Sam >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>
