Re: [OMPI devel] PMIx in 2.x

2016-11-08 Thread r...@open-mpi.org
Yes, they should find one another. You do have to pass the server’s URI on the mpirun cmd line so it can find it. > On Nov 8, 2016, at 12:40 PM, Pieter Noordhuis wrote: > > I'll open an issue on GitHub with more details. > > As far as the accept/connect issue, do you expect the processes to fi

Re: [OMPI devel] PMIx in 2.x

2016-11-08 Thread Pieter Noordhuis
I'll open an issue on GitHub with more details. As far as the accept/connect issue, do you expect the processes to find each other if they do a publish/lookup through the same orte-server if they are started separately? I can try with master as well. I'm working off of 2.0.1 now (using stable

Re: [OMPI devel] PMIx in 2.x

2016-11-08 Thread r...@open-mpi.org
Should be handled more gracefully, of course. When a proc departs, we cleanup any published storage that wasn’t specifically indicated to be retained for a longer period of time (see pmix_common.h for the various supported options). You’ve obviously found a race condition, and we’ll have to trac

Re: [OMPI devel] PMIx in 2.x

2016-11-08 Thread Pieter Noordhuis
... accidentally sent before finishing. This error happened because lookup was returning multiple published entries, N-1 of which had already disconnected. Should this case be handled more gracefully, or is it expected to fail like this? Thanks, Pieter From: Pie

Re: [OMPI devel] PMIx in 2.x

2016-11-08 Thread Pieter Noordhuis
Ah, that's good to know. I'm trying to wire things up through orte-server on 2.x now and am at the point where I can do an MPI_Publish_name on one node and MPI_Lookup_name on another (both started through their own mpirun calls). Then, when the first process does an MPI_Comm_accept and the secon