Re: [OMPI devel] RFC: fix leak of bml endpoints

2014-05-15 Thread Nathan Hjelm
Ok figured it out. There were three problems with the del_procs code: 1) ompi_mpi_finalize used ompi_proc_all to get the list of procs but never released the reference to them (ompi_proc_all called OBJ_RETAIN on all the procs returned). When calling del_procs at finalize it should

Re: [OMPI devel] RFC: fix leak of bml endpoints

2014-05-15 Thread Nathan Hjelm
On Thu, May 15, 2014 at 11:44:05AM -0600, Nathan Hjelm wrote: > On Thu, May 15, 2014 at 01:33:31PM -0400, George Bosilca wrote: > > The solution you propose here is definitively not OK. It is 1) ugly and 2) > > break the separation barrier that we hold dear. > > Which is why I asked :) > > >

Re: [OMPI devel] RFC: fix leak of bml endpoints

2014-05-15 Thread Nathan Hjelm
On Thu, May 15, 2014 at 01:33:31PM -0400, George Bosilca wrote: > The solution you propose here is definitively not OK. It is 1) ugly and 2) > break the separation barrier that we hold dear. Which is why I asked :) > Regarding your other suggestion I don’t see any reasons not to call the >

Re: [OMPI devel] RFC: fix leak of bml endpoints

2014-05-15 Thread George Bosilca
The solution you propose here is definitively not OK. It is 1) ugly and 2) break the separation barrier that we hold dear. Regarding your other suggestion I don’t see any reasons not to call the delete_proc on MPI_COMM_WORLD as the last action we do before tearing down everything else.

Re: [OMPI devel] r31765 causes crash in mpirun

2014-05-15 Thread Ralph Castain
I fixed this by reverting r31765 in r31775. Annotated ticket with explanation. On May 15, 2014, at 1:20 AM, Gilles Gouaillardet wrote: > Folks, > > since r31765 (opal/event: release the opal event context when closing > the event base) > mpirun crashes at the

[OMPI devel] RFC: fix leak of bml endpoints

2014-05-15 Thread Nathan Hjelm
What: We never call del_procs in the procs in comm world. This leads us to leak the bml endpoints created by r2. The proposed solution is not idea but it avoids adding a call to del procs for comm world. Something I know would require more discussion since there is likely a reason for that. I

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-15 Thread Gilles Gouaillardet
Nathan, this had no effect on my environment :-( i am not sure you can reuse mca_btl_scif_module.scif_fd with connect() i had to use a new scif fd for that. then i ran into an other glitch : if the listen thread does not scif_accept() the connection, the scif_connect() will take 30 seconds

[OMPI devel] r31765 causes crash in mpirun

2014-05-15 Thread Gilles Gouaillardet
Folks, since r31765 (opal/event: release the opal event context when closing the event base) mpirun crashes at the end of the job. for example : $ mpirun --mca btl tcp,self -n 4 `pwd`/src/MPI_Allreduce_user_c MPITEST info (0): Starting MPI_Allreduce_user() test MPITEST_results: