Re: [OMPI users] Can't connect using MPI Ports

2017-11-09 Thread r...@open-mpi.org
I did a quick check across the v2.1 and v3.0 OMPI releases and both failed, 
though with different signatures. Looks like a problem in the OMPI dynamics 
integration (i.e., the PMIx library looked like it was doing the right things).

I’d suggest filing an issue on the OMPI github site so someone can address it 
(I don’t work much on OMPI any more, I’m afraid).


> On Nov 9, 2017, at 1:54 AM, Florian Lindner  wrote:
> 
>>> The MPI Ports functionality (chapter 10.4 of MPI 3.1), mainly consisting of 
>>> MPI_Open_port, MPI_Comm_accept and
>>> MPI_Comm_connect is not usuable without running an ompi-server as a third 
>>> process?
>> 
>> Yes, that’s correct. The reason for moving in that direction is that the 
>> resource managers, as they continue to
>> integrate PMIx into them, are going to be providing that third party. This 
>> will make connect/accept much easier to use,
>> and a great deal more scalable.
>> 
>> See https://github.com/pmix/RFCs/blob/master/RFC0003.md for an explanation.
> 
> 
> Ok, thanks for that input. I haven't heard of pmix so far (only as part of 
> some ompi error messages).
> 
> Using ompi-server -d -r 'ompi.connect' I was able to publish and retrieve the 
> port name, however, still no connection
> could be established.
> 
> % mpirun -n 1 --ompi-server "file:ompi.connect" ./a.out A
> Published port 3044605953.0:664448538
> 
> % mpirun -n 1 --ompi-server "file:ompi.connect" ./a.out B
> Looked up port 3044605953.0:664448538
> 
> 
> at this point, both processes hang.
> 
> The code is:
> 
> #include 
> #include 
> #include 
> 
> int main(int argc, char **argv)
> {
>  MPI_Init(, );
>  std::string a(argv[1]);
>  char p[MPI_MAX_PORT_NAME];
>  MPI_Comm icomm;
> 
>  if (a == "A") {
>MPI_Open_port(MPI_INFO_NULL, p);
>MPI_Publish_name("foobar", MPI_INFO_NULL, p);
>printf("Published port %s\n", p);
>MPI_Comm_accept(p, MPI_INFO_NULL, 0, MPI_COMM_WORLD, );
>  }
>  if (a == "B") {
>MPI_Lookup_name("foobar", MPI_INFO_NULL, p);
>printf("Looked up port %s\n", p);
>MPI_Comm_connect(p, MPI_INFO_NULL, 0, MPI_COMM_WORLD, );
>  }
> 
>  MPI_Finalize();
> 
>  return 0;
> }
> 
> 
> 
> Do you have any idea?
> 
> Best,
> Florian
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Can't connect using MPI Ports

2017-11-09 Thread Florian Lindner
>> The MPI Ports functionality (chapter 10.4 of MPI 3.1), mainly consisting of 
>> MPI_Open_port, MPI_Comm_accept and
>> MPI_Comm_connect is not usuable without running an ompi-server as a third 
>> process?
> 
> Yes, that’s correct. The reason for moving in that direction is that the 
> resource managers, as they continue to
> integrate PMIx into them, are going to be providing that third party. This 
> will make connect/accept much easier to use,
> and a great deal more scalable.
> 
> See https://github.com/pmix/RFCs/blob/master/RFC0003.md for an explanation.


Ok, thanks for that input. I haven't heard of pmix so far (only as part of some 
ompi error messages).

Using ompi-server -d -r 'ompi.connect' I was able to publish and retrieve the 
port name, however, still no connection
could be established.

% mpirun -n 1 --ompi-server "file:ompi.connect" ./a.out A
Published port 3044605953.0:664448538

% mpirun -n 1 --ompi-server "file:ompi.connect" ./a.out B
Looked up port 3044605953.0:664448538


at this point, both processes hang.

The code is:

#include 
#include 
#include 

int main(int argc, char **argv)
{
  MPI_Init(, );
  std::string a(argv[1]);
  char p[MPI_MAX_PORT_NAME];
  MPI_Comm icomm;

  if (a == "A") {
MPI_Open_port(MPI_INFO_NULL, p);
MPI_Publish_name("foobar", MPI_INFO_NULL, p);
printf("Published port %s\n", p);
MPI_Comm_accept(p, MPI_INFO_NULL, 0, MPI_COMM_WORLD, );
  }
  if (a == "B") {
MPI_Lookup_name("foobar", MPI_INFO_NULL, p);
printf("Looked up port %s\n", p);
MPI_Comm_connect(p, MPI_INFO_NULL, 0, MPI_COMM_WORLD, );
  }

  MPI_Finalize();

  return 0;
}



Do you have any idea?

Best,
Florian
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Can't connect using MPI Ports

2017-11-06 Thread r...@open-mpi.org

> On Nov 6, 2017, at 7:46 AM, Florian Lindner  wrote:
> 
> Am 05.11.2017 um 20:57 schrieb r...@open-mpi.org:
>> 
>>> On Nov 5, 2017, at 6:48 AM, Florian Lindner >> > wrote:
>>> 
>>> Am 04.11.2017 um 00:05 schrieb r...@open-mpi.org :
 Yeah, there isn’t any way that is going to work in the 2.x series. I’m not 
 sure it was ever fixed, but you might try
 the latest 3.0, the 3.1rc, and even master.
 
 The only methods that are known to work are:
 
 * connecting processes within the same mpirun - e.g., using comm_spawn
>>> 
>>> That is not an option for our application.
>>> 
 * connecting processes across different mpiruns, with the ompi-server 
 daemon as the rendezvous point
 
 The old command line method (i.e., what you are trying to use) hasn’t been 
 much on the radar. I don’t know if someone
 else has picked it up or not...
>>> 
>>> What do you mean with "the old command line method”.
>>> 
>>> Isn't the ompi-server just another means of exchanging port names, i.e. the 
>>> same I do using files?
>> 
>> No, it isn’t - there is a handshake that ompi-server facilitates.
>> 
>>> 
>>> In my understanding, using Publish_name and Lookup_name or exchanging the 
>>> information using files (or command line or
>>> stdin) shouldn't have any
>>> impact on the connection (Connect / Accept) itself.
>> 
>> Depends on the implementation underneath connect/accept.
>> 
>> The initial MPI standard authors had fixed in their minds that the 
>> connect/accept handshake would take place over a TCP
>> socket, and so no intermediate rendezvous broker was involved. That isn’t 
>> how we’ve chosen to implement it this time
>> around, and so you do need the intermediary. If/when some developer wants to 
>> add another method, they are welcome to do
>> so - but the general opinion was that the broker requirement was fine.
> 
> Ok. Just to make sure I understood correctly:
> 
> The MPI Ports functionality (chapter 10.4 of MPI 3.1), mainly consisting of 
> MPI_Open_port, MPI_Comm_accept and
> MPI_Comm_connect is not usuable without running an ompi-server as a third 
> process?

Yes, that’s correct. The reason for moving in that direction is that the 
resource managers, as they continue to integrate PMIx into them, are going to 
be providing that third party. This will make connect/accept much easier to 
use, and a great deal more scalable.

See https://github.com/pmix/RFCs/blob/master/RFC0003.md 
 for an explanation.

> 
> Thank again,
> Florian
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Can't connect using MPI Ports

2017-11-06 Thread Florian Lindner
Am 05.11.2017 um 20:57 schrieb r...@open-mpi.org:
> 
>> On Nov 5, 2017, at 6:48 AM, Florian Lindner > > wrote:
>>
>> Am 04.11.2017 um 00:05 schrieb r...@open-mpi.org :
>>> Yeah, there isn’t any way that is going to work in the 2.x series. I’m not 
>>> sure it was ever fixed, but you might try
>>> the latest 3.0, the 3.1rc, and even master.
>>>
>>> The only methods that are known to work are:
>>>
>>> * connecting processes within the same mpirun - e.g., using comm_spawn
>>
>> That is not an option for our application.
>>
>>> * connecting processes across different mpiruns, with the ompi-server 
>>> daemon as the rendezvous point
>>>
>>> The old command line method (i.e., what you are trying to use) hasn’t been 
>>> much on the radar. I don’t know if someone
>>> else has picked it up or not...
>>
>> What do you mean with "the old command line method”.
>>
>> Isn't the ompi-server just another means of exchanging port names, i.e. the 
>> same I do using files?
> 
> No, it isn’t - there is a handshake that ompi-server facilitates.
> 
>>
>> In my understanding, using Publish_name and Lookup_name or exchanging the 
>> information using files (or command line or
>> stdin) shouldn't have any
>> impact on the connection (Connect / Accept) itself.
> 
> Depends on the implementation underneath connect/accept.
> 
> The initial MPI standard authors had fixed in their minds that the 
> connect/accept handshake would take place over a TCP
> socket, and so no intermediate rendezvous broker was involved. That isn’t how 
> we’ve chosen to implement it this time
> around, and so you do need the intermediary. If/when some developer wants to 
> add another method, they are welcome to do
> so - but the general opinion was that the broker requirement was fine.

Ok. Just to make sure I understood correctly:

The MPI Ports functionality (chapter 10.4 of MPI 3.1), mainly consisting of 
MPI_Open_port, MPI_Comm_accept and
MPI_Comm_connect is not usuable without running an ompi-server as a third 
process?

Thank again,
Florian
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Can't connect using MPI Ports

2017-11-05 Thread r...@open-mpi.org

> On Nov 5, 2017, at 6:48 AM, Florian Lindner  wrote:
> 
> Am 04.11.2017 um 00:05 schrieb r...@open-mpi.org :
>> Yeah, there isn’t any way that is going to work in the 2.x series. I’m not 
>> sure it was ever fixed, but you might try the latest 3.0, the 3.1rc, and 
>> even master.
>> 
>> The only methods that are known to work are:
>> 
>> * connecting processes within the same mpirun - e.g., using comm_spawn
> 
> That is not an option for our application.
> 
>> * connecting processes across different mpiruns, with the ompi-server daemon 
>> as the rendezvous point
>> 
>> The old command line method (i.e., what you are trying to use) hasn’t been 
>> much on the radar. I don’t know if someone else has picked it up or not...
> 
> What do you mean with "the old command line method”.
> 
> Isn't the ompi-server just another means of exchanging port names, i.e. the 
> same I do using files?

No, it isn’t - there is a handshake that ompi-server facilitates.

> 
> In my understanding, using Publish_name and Lookup_name or exchanging the 
> information using files (or command line or stdin) shouldn't have any
> impact on the connection (Connect / Accept) itself.

Depends on the implementation underneath connect/accept.

The initial MPI standard authors had fixed in their minds that the 
connect/accept handshake would take place over a TCP socket, and so no 
intermediate rendezvous broker was involved. That isn’t how we’ve chosen to 
implement it this time around, and so you do need the intermediary. If/when 
some developer wants to add another method, they are welcome to do so - but the 
general opinion was that the broker requirement was fine.

> 
> Best,
> Florian
> 
> 
>> Ralph
>> 
>>> On Nov 3, 2017, at 11:23 AM, Florian Lindner  wrote:
>>> 
>>> 
>>> Am 03.11.2017 um 16:18 schrieb r...@open-mpi.org:
 What version of OMPI are you using?
>>> 
>>> 2.1.1 @ Arch Linux.
>>> 
>>> Best,
>>> Florian
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>> 
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://lists.open-mpi.org/mailman/listinfo/users 
> 
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Can't connect using MPI Ports

2017-11-05 Thread Florian Lindner
Am 04.11.2017 um 00:05 schrieb r...@open-mpi.org:
> Yeah, there isn’t any way that is going to work in the 2.x series. I’m not 
> sure it was ever fixed, but you might try the latest 3.0, the 3.1rc, and even 
> master.
> 
> The only methods that are known to work are:
> 
> * connecting processes within the same mpirun - e.g., using comm_spawn

That is not an option for our application.

> * connecting processes across different mpiruns, with the ompi-server daemon 
> as the rendezvous point
> 
> The old command line method (i.e., what you are trying to use) hasn’t been 
> much on the radar. I don’t know if someone else has picked it up or not...

What do you mean with "the old command line method".

Isn't the ompi-server just another means of exchanging port names, i.e. the 
same I do using files?

In my understanding, using Publish_name and Lookup_name or exchanging the 
information using files (or command line or stdin) shouldn't have any
impact on the connection (Connect / Accept) itself.

Best,
Florian


> Ralph
> 
>> On Nov 3, 2017, at 11:23 AM, Florian Lindner  wrote:
>>
>>
>> Am 03.11.2017 um 16:18 schrieb r...@open-mpi.org:
>>> What version of OMPI are you using?
>>
>> 2.1.1 @ Arch Linux.
>>
>> Best,
>> Florian
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
> 
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Can't connect using MPI Ports

2017-11-03 Thread r...@open-mpi.org
Yeah, there isn’t any way that is going to work in the 2.x series. I’m not sure 
it was ever fixed, but you might try the latest 3.0, the 3.1rc, and even master.

The only methods that are known to work are:

* connecting processes within the same mpirun - e.g., using comm_spawn

* connecting processes across different mpiruns, with the ompi-server daemon as 
the rendezvous point

The old command line method (i.e., what you are trying to use) hasn’t been much 
on the radar. I don’t know if someone else has picked it up or not...
Ralph

> On Nov 3, 2017, at 11:23 AM, Florian Lindner  wrote:
> 
> 
> Am 03.11.2017 um 16:18 schrieb r...@open-mpi.org:
>> What version of OMPI are you using?
> 
> 2.1.1 @ Arch Linux.
> 
> Best,
> Florian
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Can't connect using MPI Ports

2017-11-03 Thread Florian Lindner

Am 03.11.2017 um 16:18 schrieb r...@open-mpi.org:
> What version of OMPI are you using?

2.1.1 @ Arch Linux.

Best,
Florian
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Can't connect using MPI Ports

2017-11-03 Thread r...@open-mpi.org
What version of OMPI are you using?

> On Nov 3, 2017, at 7:48 AM, Florian Lindner  wrote:
> 
> Hello,
> 
> I'm working on a sample program to connect two MPI communicators launched 
> with mpirun using Ports.
> 
> Firstly, I use MPI_Open_port to obtain a name and write that to a file:
> 
>  if (options.participant == A) { // A publishes the port
>if (options.commType == single and rank == 0)
>  openPublishPort(options);
> 
>if (options.commType == many)
>  openPublishPort(options);
>  }
>  MPI_Barrier(MPI_COMM_WORLD);
> 
> participant is a command line argument and defines the role of A as server. B 
> is the client.
> 
> void openPublishPort(Options options)
> {
>  using namespace boost::filesystem;
>  int rank;
>  MPI_Comm_rank(MPI_COMM_WORLD, );
> 
>  char p[MPI_MAX_PORT_NAME];
>  MPI_Open_port(MPI_INFO_NULL, p);
>  std::string portName(p);
> 
>  create_directory(options.publishDirectory);
>  std::string filename;
>  if (options.commType == many)
>filename = "A-" + std::to_string(rank) + ".address";
>  if (options.commType == single)
>filename = "intercomm.address";
> 
>  auto path = options.publishDirectory / filename;
>  DEBUG << "Writing address " << portName << " to " << path;
>  std::ofstream ofs(path.string(), std::ofstream::out);
>  ofs << portName;
> }
> 
> This works fine as far as I see. Next, I try to connect:
> 
>  MPI_Comm icomm;
>  std::string portName;
>  if (options.participant == A) { // receives connections
>if (options.commType == single) {
>  if (rank == 0)
>portName = readPort(options);
>  INFO << "Accepting connection on " << portName;
>  MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, 
> );
>  INFO << "Received connection";
>}
>  }
> 
>  if (options.participant == B) { // connects to the intercomms
>if (options.commType == single) {
>  if (rank == 0)
>portName = readPort(options);
>  INFO << "Trying to connect to " << portName;
>  MPI_Comm_connect(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, 
> );
>  INFO << "Connected";
>}
>  }
> 
> 
> options.single says that I want to use a single communicator that contains 
> all ranks on both participants, A and B.
> readPort reads the port name from the file that was written before.
> 
> Now, when I first launch A and, in another terminal, B, nothing happens until 
> a timeout occurs.
> 
> % mpirun -n 1 ./mpiports --commType="single" --participant="A"
> [2017-11-03 15:29:55.469891] [debug]   Writing address 
> 3048013825.0:1069313090 to "./publish/intercomm.address"
> [2017-11-03 15:29:55.470169] [debug]   Read address 3048013825.0:1069313090 
> from "./publish/intercomm.address"
> [2017-11-03 15:29:55.470185] [info]Accepting connection on 
> 3048013825.0:1069313090
> [asaru:16199] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
> [...]
> 
> and on the other site:
> 
> % mpirun -n 1 ./mpiports --commType="single" --participant="B"
> [2017-11-03 15:29:59.698921] [debug]   Read address 3048013825.0:1069313090 
> from "./publish/intercomm.address"
> [2017-11-03 15:29:59.698947] [info]Trying to connect to 
> 3048013825.0:1069313090
> [asaru:16238] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
> [...]
> 
> The complete code, including cmake build script can be downloaded at:
> 
> https://www.dropbox.com/s/azo5ti4kjg12zjy/MPI_Ports.tar.gz?dl=0
> 
> Why is the connection not working?
> 
> Thanks a lot,
> Florian
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] Can't connect using MPI Ports

2017-11-03 Thread Florian Lindner
Hello,

I'm working on a sample program to connect two MPI communicators launched with 
mpirun using Ports.

Firstly, I use MPI_Open_port to obtain a name and write that to a file:

  if (options.participant == A) { // A publishes the port
if (options.commType == single and rank == 0)
  openPublishPort(options);

if (options.commType == many)
  openPublishPort(options);
  }
  MPI_Barrier(MPI_COMM_WORLD);

participant is a command line argument and defines the role of A as server. B 
is the client.

void openPublishPort(Options options)
{
  using namespace boost::filesystem;
  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, );

  char p[MPI_MAX_PORT_NAME];
  MPI_Open_port(MPI_INFO_NULL, p);
  std::string portName(p);

  create_directory(options.publishDirectory);
  std::string filename;
  if (options.commType == many)
filename = "A-" + std::to_string(rank) + ".address";
  if (options.commType == single)
filename = "intercomm.address";

  auto path = options.publishDirectory / filename;
  DEBUG << "Writing address " << portName << " to " << path;
  std::ofstream ofs(path.string(), std::ofstream::out);
  ofs << portName;
}

This works fine as far as I see. Next, I try to connect:

  MPI_Comm icomm;
  std::string portName;
  if (options.participant == A) { // receives connections
if (options.commType == single) {
  if (rank == 0)
portName = readPort(options);
  INFO << "Accepting connection on " << portName;
  MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, 
);
  INFO << "Received connection";
}
  }

  if (options.participant == B) { // connects to the intercomms
if (options.commType == single) {
  if (rank == 0)
portName = readPort(options);
  INFO << "Trying to connect to " << portName;
  MPI_Comm_connect(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_WORLD, 
);
  INFO << "Connected";
}
  }


options.single says that I want to use a single communicator that contains all 
ranks on both participants, A and B.
readPort reads the port name from the file that was written before.

Now, when I first launch A and, in another terminal, B, nothing happens until a 
timeout occurs.

% mpirun -n 1 ./mpiports --commType="single" --participant="A"
[2017-11-03 15:29:55.469891] [debug]   Writing address 3048013825.0:1069313090 
to "./publish/intercomm.address"
[2017-11-03 15:29:55.470169] [debug]   Read address 3048013825.0:1069313090 
from "./publish/intercomm.address"
[2017-11-03 15:29:55.470185] [info]Accepting connection on 
3048013825.0:1069313090
[asaru:16199] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
[...]

and on the other site:

% mpirun -n 1 ./mpiports --commType="single" --participant="B"
[2017-11-03 15:29:59.698921] [debug]   Read address 3048013825.0:1069313090 
from "./publish/intercomm.address"
[2017-11-03 15:29:59.698947] [info]Trying to connect to 
3048013825.0:1069313090
[asaru:16238] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 195
[...]

The complete code, including cmake build script can be downloaded at:

https://www.dropbox.com/s/azo5ti4kjg12zjy/MPI_Ports.tar.gz?dl=0

Why is the connection not working?

Thanks a lot,
Florian


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users