Yes, they should find one another. You do have to pass the server’s URI on the
mpirun cmd line so it can find it.
> On Nov 8, 2016, at 12:40 PM, Pieter Noordhuis wrote:
>
> I'll open an issue on GitHub with more details.
>
> As far as the accept/connect issue, do you expect the processes to fi
I'll open an issue on GitHub with more details.
As far as the accept/connect issue, do you expect the processes to find each
other if they do a publish/lookup through the same orte-server if they are
started separately?
I can try with master as well. I'm working off of 2.0.1 now (using stable
Should be handled more gracefully, of course. When a proc departs, we cleanup
any published storage that wasn’t specifically indicated to be retained for a
longer period of time (see pmix_common.h for the various supported options).
You’ve obviously found a race condition, and we’ll have to trac
... accidentally sent before finishing.
This error happened because lookup was returning multiple published entries,
N-1 of which had already disconnected. Should this case be handled more
gracefully, or is it expected to fail like this?
Thanks,
Pieter
From: Pie
Ah, that's good to know. I'm trying to wire things up through orte-server on
2.x now and am at the point where I can do an MPI_Publish_name on one node and
MPI_Lookup_name on another (both started through their own mpirun calls). Then,
when the first process does an MPI_Comm_accept and the secon