Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Protze, Joachim via users
Hi Gilles,

MPI provides explicit comparison functions to compare opaque handles, for 
communicators it is MPI_Comm_compare.

- Joachim

From: users  on behalf of Guillaume De Nayer 
via users 
Sent: Friday, June 24, 2022 5:13:17 PM
To: users@lists.open-mpi.org 
Cc: Guillaume De Nayer 
Subject: Re: [OMPI users] Intercommunicator issue (any standard about 
communicator?)

Thank you for these infos!

On 06/24/2022 05:06 PM, Jeff Squyres (jsquyres) via users wrote:
> Open MPI and MPICH are completely unrelated -- we're entirely different code 
> bases (note that Intel MPI is derived from MPICH).
>
> Case in point is what Gilles cited: Open MPI chose to implement MPI_Comm 
> handles as pointers, but MPICH chose to implement MPI_Comm handles as 
> integers.  Hence, you can't really compare the MPI_Comm values from Open MPI 
> vs. MPI_Comm values from MPICH/Intel MPI -- they're fundamentally 
> representing different things.
>
> The MPI standard doesn't say anything about the values of MPI handles (e.g., 
> MPI_Comm handles).  They're just a value that a user program can pass around. 
>  When that handle is given to the MPI implementation (e.g., by passing it to 
> MPI_Send() or other MPI API), the only rule is that the MPI implementation 
> has to be able to map that handle into whatever back end data structures are 
> relevant to implement the concept of an MPI communicator.  Hence: the handle 
> is meaningless to the application -- it's just an opaque value that the user 
> program can pass around.
>
> User applications *can* compare it to the value for MPI_COMM_NULL, but that's 
> about it.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> 
> From: users  on behalf of Guillaume De 
> Nayer via users 
> Sent: Friday, June 24, 2022 8:29 AM
> To: users@lists.open-mpi.org
> Cc: Guillaume De Nayer
> Subject: Re: [OMPI users] Intercommunicator issue (any standard about 
> communicator?)
>
> Hi Gilles,
>
> I'm using both openmpi and intel mpi. I have with both problem with the
> communicators. Therefore, I tried to get some infos about them.
>
> Thx a lot for your help.
> Have a nice day
>
> On 06/24/2022 02:14 PM, Gilles Gouaillardet via users wrote:
>> Guillaume,
>>
>> MPI_Comm is an opaque handler that should not be interpreted by an end user.
>>
>> Open MPI chose to implement is as an opaque pointer, and MPICH chose to
>> implement it as a 32 bits unsigned integer.
>> The 4400 value strongly suggests you are using MPICH and you are
>> hence posting to the wrong mailing list
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On Fri, Jun 24, 2022 at 9:06 PM Guillaume De Nayer via users
>> mailto:users@lists.open-mpi.org>> wrote:
>>
>> Hi Gilles,
>>
>> MPI_COMM_WORLD is positive (4400).
>>
>> In a short code I wrote I have something like that:
>>
>> MPI_Comm_dup(MPI_COMM_WORLD, );
>> cout << "intra-communicator: " << "world" << "---" << hex << world
>> << endl;
>>
>> It returns "8406" (in hex).
>>
>> later I have:
>>
>> MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, world, );
>> cout << "intercommunicator interClient=" << interClient << endl;
>>
>> After connection from a third party client it returns "c403" (in
>> hex).
>>
>> Both 8406 and c403 are negative integer in dec.
>>
>> I don't know if it is "normal". Therefore I'm looking about rules on the
>> communicators, intercommunicators.
>>
>> Regards,
>> Guillaume
>>
>>
>> On 06/24/2022 11:56 AM, Gilles Gouaillardet via users wrote:
>> > Guillaume,
>> >
>> > what do you mean by (the intercommunicators are all negative"?
>> >
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Fri, Jun 24, 2022 at 4:23 PM Guillaume De Nayer via users
>> > mailto:users@lists.open-mpi.org>
>> >>
>> wrote:
>> >
>> > Hi,
>> >
>> > I am new on this list. Let me introduce myself shortly: I am a
>> > researcher in fluid mechanics. In this context I am using
>> softwares
>> > related on MPI.
>> >
>> > I am facing a problem:
>> > - 3 programs forms a computational framework. Soft1 is a coupling
>> > program, i.e., it opens an MPI port at the beginning. Soft2
>> and Soft3
>> > are clients, which connect to the coupling program using
>> > MPI_Comm_connect.
>> > - After the start and the connections of Soft2 and Soft3 with
>> Soft1, it
>> > hangs.
>> >
>> > I started to debug this issue and as usual I found another
>> issue (or
>> > perhaps it is not an issue):
>> > - The intercommunicators I get between Soft1-Soft2 and
>> Soft1-Soft3 are
>> > all negative (running on CentOS 7 with infiniband Mellanox
>> OFED driver).
>> > - Is there some standard about communicator? I 

Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Guillaume De Nayer via users
Thank you for these infos!

On 06/24/2022 05:06 PM, Jeff Squyres (jsquyres) via users wrote:
> Open MPI and MPICH are completely unrelated -- we're entirely different code 
> bases (note that Intel MPI is derived from MPICH).
> 
> Case in point is what Gilles cited: Open MPI chose to implement MPI_Comm 
> handles as pointers, but MPICH chose to implement MPI_Comm handles as 
> integers.  Hence, you can't really compare the MPI_Comm values from Open MPI 
> vs. MPI_Comm values from MPICH/Intel MPI -- they're fundamentally 
> representing different things.
> 
> The MPI standard doesn't say anything about the values of MPI handles (e.g., 
> MPI_Comm handles).  They're just a value that a user program can pass around. 
>  When that handle is given to the MPI implementation (e.g., by passing it to 
> MPI_Send() or other MPI API), the only rule is that the MPI implementation 
> has to be able to map that handle into whatever back end data structures are 
> relevant to implement the concept of an MPI communicator.  Hence: the handle 
> is meaningless to the application -- it's just an opaque value that the user 
> program can pass around.
> 
> User applications *can* compare it to the value for MPI_COMM_NULL, but that's 
> about it.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> 
> From: users  on behalf of Guillaume De 
> Nayer via users 
> Sent: Friday, June 24, 2022 8:29 AM
> To: users@lists.open-mpi.org
> Cc: Guillaume De Nayer
> Subject: Re: [OMPI users] Intercommunicator issue (any standard about 
> communicator?)
> 
> Hi Gilles,
> 
> I'm using both openmpi and intel mpi. I have with both problem with the
> communicators. Therefore, I tried to get some infos about them.
> 
> Thx a lot for your help.
> Have a nice day
> 
> On 06/24/2022 02:14 PM, Gilles Gouaillardet via users wrote:
>> Guillaume,
>>
>> MPI_Comm is an opaque handler that should not be interpreted by an end user.
>>
>> Open MPI chose to implement is as an opaque pointer, and MPICH chose to
>> implement it as a 32 bits unsigned integer.
>> The 4400 value strongly suggests you are using MPICH and you are
>> hence posting to the wrong mailing list
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On Fri, Jun 24, 2022 at 9:06 PM Guillaume De Nayer via users
>> mailto:users@lists.open-mpi.org>> wrote:
>>
>> Hi Gilles,
>>
>> MPI_COMM_WORLD is positive (4400).
>>
>> In a short code I wrote I have something like that:
>>
>> MPI_Comm_dup(MPI_COMM_WORLD, );
>> cout << "intra-communicator: " << "world" << "---" << hex << world
>> << endl;
>>
>> It returns "8406" (in hex).
>>
>> later I have:
>>
>> MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, world, );
>> cout << "intercommunicator interClient=" << interClient << endl;
>>
>> After connection from a third party client it returns "c403" (in
>> hex).
>>
>> Both 8406 and c403 are negative integer in dec.
>>
>> I don't know if it is "normal". Therefore I'm looking about rules on the
>> communicators, intercommunicators.
>>
>> Regards,
>> Guillaume
>>
>>
>> On 06/24/2022 11:56 AM, Gilles Gouaillardet via users wrote:
>> > Guillaume,
>> >
>> > what do you mean by (the intercommunicators are all negative"?
>> >
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Fri, Jun 24, 2022 at 4:23 PM Guillaume De Nayer via users
>> > mailto:users@lists.open-mpi.org>
>> >>
>> wrote:
>> >
>> > Hi,
>> >
>> > I am new on this list. Let me introduce myself shortly: I am a
>> > researcher in fluid mechanics. In this context I am using
>> softwares
>> > related on MPI.
>> >
>> > I am facing a problem:
>> > - 3 programs forms a computational framework. Soft1 is a coupling
>> > program, i.e., it opens an MPI port at the beginning. Soft2
>> and Soft3
>> > are clients, which connect to the coupling program using
>> > MPI_Comm_connect.
>> > - After the start and the connections of Soft2 and Soft3 with
>> Soft1, it
>> > hangs.
>> >
>> > I started to debug this issue and as usual I found another
>> issue (or
>> > perhaps it is not an issue):
>> > - The intercommunicators I get between Soft1-Soft2 and
>> Soft1-Soft3 are
>> > all negative (running on CentOS 7 with infiniband Mellanox
>> OFED driver).
>> > - Is there some standard about communicator? I don't find anything
>> > about
>> > this topic.
>> > - What is a valid communicator, intercommunicator?
>> >
>> > thx a lot
>> > Regards
>> > Guillaume
>> >
>>
>>
> 
> 
> 




Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Jeff Squyres (jsquyres) via users
Open MPI and MPICH are completely unrelated -- we're entirely different code 
bases (note that Intel MPI is derived from MPICH).

Case in point is what Gilles cited: Open MPI chose to implement MPI_Comm 
handles as pointers, but MPICH chose to implement MPI_Comm handles as integers. 
 Hence, you can't really compare the MPI_Comm values from Open MPI vs. MPI_Comm 
values from MPICH/Intel MPI -- they're fundamentally representing different 
things.

The MPI standard doesn't say anything about the values of MPI handles (e.g., 
MPI_Comm handles).  They're just a value that a user program can pass around.  
When that handle is given to the MPI implementation (e.g., by passing it to 
MPI_Send() or other MPI API), the only rule is that the MPI implementation has 
to be able to map that handle into whatever back end data structures are 
relevant to implement the concept of an MPI communicator.  Hence: the handle is 
meaningless to the application -- it's just an opaque value that the user 
program can pass around.

User applications *can* compare it to the value for MPI_COMM_NULL, but that's 
about it.

--
Jeff Squyres
jsquy...@cisco.com


From: users  on behalf of Guillaume De Nayer 
via users 
Sent: Friday, June 24, 2022 8:29 AM
To: users@lists.open-mpi.org
Cc: Guillaume De Nayer
Subject: Re: [OMPI users] Intercommunicator issue (any standard about 
communicator?)

Hi Gilles,

I'm using both openmpi and intel mpi. I have with both problem with the
communicators. Therefore, I tried to get some infos about them.

Thx a lot for your help.
Have a nice day

On 06/24/2022 02:14 PM, Gilles Gouaillardet via users wrote:
> Guillaume,
>
> MPI_Comm is an opaque handler that should not be interpreted by an end user.
>
> Open MPI chose to implement is as an opaque pointer, and MPICH chose to
> implement it as a 32 bits unsigned integer.
> The 4400 value strongly suggests you are using MPICH and you are
> hence posting to the wrong mailing list
>
>
> Cheers,
>
> Gilles
>
> On Fri, Jun 24, 2022 at 9:06 PM Guillaume De Nayer via users
> mailto:users@lists.open-mpi.org>> wrote:
>
> Hi Gilles,
>
> MPI_COMM_WORLD is positive (4400).
>
> In a short code I wrote I have something like that:
>
> MPI_Comm_dup(MPI_COMM_WORLD, );
> cout << "intra-communicator: " << "world" << "---" << hex << world
> << endl;
>
> It returns "8406" (in hex).
>
> later I have:
>
> MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, world, );
> cout << "intercommunicator interClient=" << interClient << endl;
>
> After connection from a third party client it returns "c403" (in
> hex).
>
> Both 8406 and c403 are negative integer in dec.
>
> I don't know if it is "normal". Therefore I'm looking about rules on the
> communicators, intercommunicators.
>
> Regards,
> Guillaume
>
>
> On 06/24/2022 11:56 AM, Gilles Gouaillardet via users wrote:
> > Guillaume,
> >
> > what do you mean by (the intercommunicators are all negative"?
> >
> >
> > Cheers,
> >
> > Gilles
> >
> > On Fri, Jun 24, 2022 at 4:23 PM Guillaume De Nayer via users
> > mailto:users@lists.open-mpi.org>
> >>
> wrote:
> >
> > Hi,
> >
> > I am new on this list. Let me introduce myself shortly: I am a
> > researcher in fluid mechanics. In this context I am using
> softwares
> > related on MPI.
> >
> > I am facing a problem:
> > - 3 programs forms a computational framework. Soft1 is a coupling
> > program, i.e., it opens an MPI port at the beginning. Soft2
> and Soft3
> > are clients, which connect to the coupling program using
> > MPI_Comm_connect.
> > - After the start and the connections of Soft2 and Soft3 with
> Soft1, it
> > hangs.
> >
> > I started to debug this issue and as usual I found another
> issue (or
> > perhaps it is not an issue):
> > - The intercommunicators I get between Soft1-Soft2 and
> Soft1-Soft3 are
> > all negative (running on CentOS 7 with infiniband Mellanox
> OFED driver).
> > - Is there some standard about communicator? I don't find anything
> > about
> > this topic.
> > - What is a valid communicator, intercommunicator?
> >
> > thx a lot
> > Regards
> > Guillaume
> >
>
>




Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Guillaume De Nayer via users
Hi Gilles,

I'm using both openmpi and intel mpi. I have with both problem with the
communicators. Therefore, I tried to get some infos about them.

Thx a lot for your help.
Have a nice day

On 06/24/2022 02:14 PM, Gilles Gouaillardet via users wrote:
> Guillaume,
> 
> MPI_Comm is an opaque handler that should not be interpreted by an end user.
> 
> Open MPI chose to implement is as an opaque pointer, and MPICH chose to
> implement it as a 32 bits unsigned integer.
> The 4400 value strongly suggests you are using MPICH and you are
> hence posting to the wrong mailing list
> 
> 
> Cheers,
> 
> Gilles
> 
> On Fri, Jun 24, 2022 at 9:06 PM Guillaume De Nayer via users
> mailto:users@lists.open-mpi.org>> wrote:
> 
> Hi Gilles,
> 
> MPI_COMM_WORLD is positive (4400).
> 
> In a short code I wrote I have something like that:
> 
> MPI_Comm_dup(MPI_COMM_WORLD, );
> cout << "intra-communicator: " << "world" << "---" << hex << world
> << endl;
> 
> It returns "8406" (in hex).
> 
> later I have:
> 
> MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, world, );
> cout << "intercommunicator interClient=" << interClient << endl;
> 
> After connection from a third party client it returns "c403" (in
> hex).
> 
> Both 8406 and c403 are negative integer in dec.
> 
> I don't know if it is "normal". Therefore I'm looking about rules on the
> communicators, intercommunicators.
> 
> Regards,
> Guillaume
> 
> 
> On 06/24/2022 11:56 AM, Gilles Gouaillardet via users wrote:
> > Guillaume,
> >
> > what do you mean by (the intercommunicators are all negative"?
> >
> >
> > Cheers,
> >
> > Gilles
> >
> > On Fri, Jun 24, 2022 at 4:23 PM Guillaume De Nayer via users
> > mailto:users@lists.open-mpi.org>
> >>
> wrote:
> >
> > Hi,
> >
> > I am new on this list. Let me introduce myself shortly: I am a
> > researcher in fluid mechanics. In this context I am using
> softwares
> > related on MPI.
> >
> > I am facing a problem:
> > - 3 programs forms a computational framework. Soft1 is a coupling
> > program, i.e., it opens an MPI port at the beginning. Soft2
> and Soft3
> > are clients, which connect to the coupling program using
> > MPI_Comm_connect.
> > - After the start and the connections of Soft2 and Soft3 with
> Soft1, it
> > hangs.
> >
> > I started to debug this issue and as usual I found another
> issue (or
> > perhaps it is not an issue):
> > - The intercommunicators I get between Soft1-Soft2 and
> Soft1-Soft3 are
> > all negative (running on CentOS 7 with infiniband Mellanox
> OFED driver).
> > - Is there some standard about communicator? I don't find anything
> > about
> > this topic.
> > - What is a valid communicator, intercommunicator?
> >
> > thx a lot
> > Regards
> > Guillaume
> >
> 
> 




Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Gilles Gouaillardet via users
Guillaume,

MPI_Comm is an opaque handler that should not be interpreted by an end user.

Open MPI chose to implement is as an opaque pointer, and MPICH chose to
implement it as a 32 bits unsigned integer.
The 4400 value strongly suggests you are using MPICH and you are hence
posting to the wrong mailing list


Cheers,

Gilles

On Fri, Jun 24, 2022 at 9:06 PM Guillaume De Nayer via users <
users@lists.open-mpi.org> wrote:

> Hi Gilles,
>
> MPI_COMM_WORLD is positive (4400).
>
> In a short code I wrote I have something like that:
>
> MPI_Comm_dup(MPI_COMM_WORLD, );
> cout << "intra-communicator: " << "world" << "---" << hex << world << endl;
>
> It returns "8406" (in hex).
>
> later I have:
>
> MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, world, );
> cout << "intercommunicator interClient=" << interClient << endl;
>
> After connection from a third party client it returns "c403" (in hex).
>
> Both 8406 and c403 are negative integer in dec.
>
> I don't know if it is "normal". Therefore I'm looking about rules on the
> communicators, intercommunicators.
>
> Regards,
> Guillaume
>
>
> On 06/24/2022 11:56 AM, Gilles Gouaillardet via users wrote:
> > Guillaume,
> >
> > what do you mean by (the intercommunicators are all negative"?
> >
> >
> > Cheers,
> >
> > Gilles
> >
> > On Fri, Jun 24, 2022 at 4:23 PM Guillaume De Nayer via users
> > mailto:users@lists.open-mpi.org>> wrote:
> >
> > Hi,
> >
> > I am new on this list. Let me introduce myself shortly: I am a
> > researcher in fluid mechanics. In this context I am using softwares
> > related on MPI.
> >
> > I am facing a problem:
> > - 3 programs forms a computational framework. Soft1 is a coupling
> > program, i.e., it opens an MPI port at the beginning. Soft2 and Soft3
> > are clients, which connect to the coupling program using
> > MPI_Comm_connect.
> > - After the start and the connections of Soft2 and Soft3 with Soft1,
> it
> > hangs.
> >
> > I started to debug this issue and as usual I found another issue (or
> > perhaps it is not an issue):
> > - The intercommunicators I get between Soft1-Soft2 and Soft1-Soft3
> are
> > all negative (running on CentOS 7 with infiniband Mellanox OFED
> driver).
> > - Is there some standard about communicator? I don't find anything
> > about
> > this topic.
> > - What is a valid communicator, intercommunicator?
> >
> > thx a lot
> > Regards
> > Guillaume
> >
>
>
>


Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Guillaume De Nayer via users
On 06/24/2022 01:38 PM, Jeff Squyres (jsquyres) via users wrote:
> Guillaume --
> 
> There is an MPI Standard document that you can obtain from mpi-forum.org.  
> Open MPI v4.x adheres to MPI version 3.1 (the latest version of the MPI 
> standard is v4.0, but that is unrelated to Open MPI's version number).
> 

I already downloaded it. But I did not find rules on the value of a
communicator (I admit I did not read all the 1200 pages carefully...).

- The communicator has to be an integer and that's all? Can it be
positive or negative?
- It is initialized to MPI_COMM_NULL, which is invalid. Then, after
connect/accept it got a value and become valid.

> Frankly, Open MPI's support of the dynamic API functionality 
> (connect/accept/etc.) has always been a bit shaky; they have been tested to 
> work in very, very specific conditions, and not made super robust to work in 
> many different / generalized cases.  Is there a chance you can orient your 
> app to not use the MPI dynamic APIs?
> 

ok.

Thx for your reply.
Regards
Guillaume


> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> 
> From: users  on behalf of Gilles 
> Gouaillardet via users 
> Sent: Friday, June 24, 2022 5:56 AM
> To: Open MPI Users
> Cc: Gilles Gouaillardet
> Subject: Re: [OMPI users] Intercommunicator issue (any standard about 
> communicator?)
> 
> Guillaume,
> 
> what do you mean by (the intercommunicators are all negative"?
> 
> 
> Cheers,
> 
> Gilles
> 
> On Fri, Jun 24, 2022 at 4:23 PM Guillaume De Nayer via users 
> mailto:users@lists.open-mpi.org>> wrote:
> Hi,
> 
> I am new on this list. Let me introduce myself shortly: I am a
> researcher in fluid mechanics. In this context I am using softwares
> related on MPI.
> 
> I am facing a problem:
> - 3 programs forms a computational framework. Soft1 is a coupling
> program, i.e., it opens an MPI port at the beginning. Soft2 and Soft3
> are clients, which connect to the coupling program using MPI_Comm_connect.
> - After the start and the connections of Soft2 and Soft3 with Soft1, it
> hangs.
> 
> I started to debug this issue and as usual I found another issue (or
> perhaps it is not an issue):
> - The intercommunicators I get between Soft1-Soft2 and Soft1-Soft3 are
> all negative (running on CentOS 7 with infiniband Mellanox OFED driver).
> - Is there some standard about communicator? I don't find anything about
> this topic.
> - What is a valid communicator, intercommunicator?
> 
> thx a lot
> Regards
> Guillaume
> 
> 




Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Guillaume De Nayer via users
Hi Gilles,

MPI_COMM_WORLD is positive (4400).

In a short code I wrote I have something like that:

MPI_Comm_dup(MPI_COMM_WORLD, );
cout << "intra-communicator: " << "world" << "---" << hex << world << endl;

It returns "8406" (in hex).

later I have:

MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, world, );
cout << "intercommunicator interClient=" << interClient << endl;

After connection from a third party client it returns "c403" (in hex).

Both 8406 and c403 are negative integer in dec.

I don't know if it is "normal". Therefore I'm looking about rules on the
communicators, intercommunicators.

Regards,
Guillaume


On 06/24/2022 11:56 AM, Gilles Gouaillardet via users wrote:
> Guillaume,
> 
> what do you mean by (the intercommunicators are all negative"?
> 
> 
> Cheers,
> 
> Gilles
> 
> On Fri, Jun 24, 2022 at 4:23 PM Guillaume De Nayer via users
> mailto:users@lists.open-mpi.org>> wrote:
> 
> Hi,
> 
> I am new on this list. Let me introduce myself shortly: I am a
> researcher in fluid mechanics. In this context I am using softwares
> related on MPI.
> 
> I am facing a problem:
> - 3 programs forms a computational framework. Soft1 is a coupling
> program, i.e., it opens an MPI port at the beginning. Soft2 and Soft3
> are clients, which connect to the coupling program using
> MPI_Comm_connect.
> - After the start and the connections of Soft2 and Soft3 with Soft1, it
> hangs.
> 
> I started to debug this issue and as usual I found another issue (or
> perhaps it is not an issue):
> - The intercommunicators I get between Soft1-Soft2 and Soft1-Soft3 are
> all negative (running on CentOS 7 with infiniband Mellanox OFED driver).
> - Is there some standard about communicator? I don't find anything
> about
> this topic.
> - What is a valid communicator, intercommunicator?
> 
> thx a lot
> Regards
> Guillaume
> 




Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-24 Thread Gilles Gouaillardet via users
Sorry if I did not make my intent clear.

I was basically suggesting to hack the Open MPI and PMIx wrappers to 
hostname() and remove the problematic underscores to make the regx 
components a happy panda again.

Cheers,

Gilles

- Original Message -
> I think the files suggested by Gilles are more about the underlying 
call to get the hostname; those won't be problematic.
> 
> The regex Open MPI modules are where Open MPI is running into a 
problem with your hostnames (i.e., your hostnames don't fit into Open 
MPI's expectations of the format of the hostname).  I'm surprised that 
using the naive module (instead of the fwd module) doesn't solve your 
problem.  ...oh shoot, I see why.  It's because I had a typo in what I 
suggested to you.
> 
> Please try:  mpirun --mca regx naive ...
> 
> (i.e., "regx", not "regex")
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> 
> From: Patrick Begou 
> Sent: Tuesday, June 21, 2022 12:10 PM
> To: Jeff Squyres (jsquyres); Open MPI Users
> Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster
> 
> Hi Jeff,
> 
> Unfortunately the workaround with "--mca regex naive" does not change 
the behaviour. I'm going to investigate OpenMPI sources files as 
suggested by Gilles.
> 
> Patrick
> 
> Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit :
> 
> Ah; this is a slightly different error than what Gilles was guessing 
from your prior description.  This is what you're running in to: 
https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134

> 
> Try running with:
> 
> mpirun --mca regex naive ...
> 
> Specifically: the "fwd" regex component is selected by default, but it 
has certain expectations about the format of hostnames.  Try using the "
naive" regex component, instead.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> 
> From: Patrick Begou 
> Sent: Thursday, June 16, 2022 9:48 AM
> To: Jeff Squyres (jsquyres); Open MPI Users
> Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster
> 
> Hi  Gilles and Jeff,
> 
> @Gilles I will have a look at these files, thanks.
> 
> @Jeff this is the error message (screen dump attached) and of course 
the nodes names do not agree with the standard.
> 
> Patrick
> 
> [cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr]
> 
> Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :
> 
> What exactly is the error that is occurring?
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> 
> From: users  on behalf of Patrick Begou via users <
users@lists.open-mpi.org>
> Sent: Thursday, June 16, 2022 3:21 AM
> To: Open MPI Users
> Cc: Patrick Begou
> Subject: [OMPI users] OpenMPI and names of the nodes in a cluster
> 
> Hi all,
> 
> we are facing a serious problem with OpenMPI (4.0.2) that we have
> deployed on a cluster. We do not manage this large cluster and the 
names
> of the nodes do not agree with Internet standards for protocols: they
> contain a "_" (underscore) character.
> 
> So OpenMPI complains about this and do not run.
> 
> I've tried to use IP instead of host names in the host file without 
any
> success.
> 
> Is there a known workaround for this as requesting the administrators 
to
> change the nodes names on this large cluster may be difficult.
> 
> Thanks
> 
> Patrick
> 
> 
> 
> 
> 
> 
> 


Re: [OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Jeff Squyres (jsquyres) via users
Guillaume --

There is an MPI Standard document that you can obtain from mpi-forum.org.  Open 
MPI v4.x adheres to MPI version 3.1 (the latest version of the MPI standard is 
v4.0, but that is unrelated to Open MPI's version number).

Frankly, Open MPI's support of the dynamic API functionality 
(connect/accept/etc.) has always been a bit shaky; they have been tested to 
work in very, very specific conditions, and not made super robust to work in 
many different / generalized cases.  Is there a chance you can orient your app 
to not use the MPI dynamic APIs?

--
Jeff Squyres
jsquy...@cisco.com


From: users  on behalf of Gilles Gouaillardet 
via users 
Sent: Friday, June 24, 2022 5:56 AM
To: Open MPI Users
Cc: Gilles Gouaillardet
Subject: Re: [OMPI users] Intercommunicator issue (any standard about 
communicator?)

Guillaume,

what do you mean by (the intercommunicators are all negative"?


Cheers,

Gilles

On Fri, Jun 24, 2022 at 4:23 PM Guillaume De Nayer via users 
mailto:users@lists.open-mpi.org>> wrote:
Hi,

I am new on this list. Let me introduce myself shortly: I am a
researcher in fluid mechanics. In this context I am using softwares
related on MPI.

I am facing a problem:
- 3 programs forms a computational framework. Soft1 is a coupling
program, i.e., it opens an MPI port at the beginning. Soft2 and Soft3
are clients, which connect to the coupling program using MPI_Comm_connect.
- After the start and the connections of Soft2 and Soft3 with Soft1, it
hangs.

I started to debug this issue and as usual I found another issue (or
perhaps it is not an issue):
- The intercommunicators I get between Soft1-Soft2 and Soft1-Soft3 are
all negative (running on CentOS 7 with infiniband Mellanox OFED driver).
- Is there some standard about communicator? I don't find anything about
this topic.
- What is a valid communicator, intercommunicator?

thx a lot
Regards
Guillaume



Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-24 Thread Jeff Squyres (jsquyres) via users
I think the files suggested by Gilles are more about the underlying call to get 
the hostname; those won't be problematic.

The regex Open MPI modules are where Open MPI is running into a problem with 
your hostnames (i.e., your hostnames don't fit into Open MPI's expectations of 
the format of the hostname).  I'm surprised that using the naive module 
(instead of the fwd module) doesn't solve your problem.  ...oh shoot, I see 
why.  It's because I had a typo in what I suggested to you.

Please try:  mpirun --mca regx naive ...

(i.e., "regx", not "regex")

--
Jeff Squyres
jsquy...@cisco.com


From: Patrick Begou 
Sent: Tuesday, June 21, 2022 12:10 PM
To: Jeff Squyres (jsquyres); Open MPI Users
Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi Jeff,

Unfortunately the workaround with "--mca regex naive" does not change the 
behaviour. I'm going to investigate OpenMPI sources files as suggested by 
Gilles.

Patrick

Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit :

Ah; this is a slightly different error than what Gilles was guessing from your 
prior description.  This is what you're running in to: 
https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134

Try running with:

mpirun --mca regex naive ...

Specifically: the "fwd" regex component is selected by default, but it has 
certain expectations about the format of hostnames.  Try using the "naive" 
regex component, instead.

--
Jeff Squyres
jsquy...@cisco.com


From: Patrick Begou 

Sent: Thursday, June 16, 2022 9:48 AM
To: Jeff Squyres (jsquyres); Open MPI Users
Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi  Gilles and Jeff,

@Gilles I will have a look at these files, thanks.

@Jeff this is the error message (screen dump attached) and of course the nodes 
names do not agree with the standard.

Patrick

[cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr]

Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :

What exactly is the error that is occurring?

--
Jeff Squyres
jsquy...@cisco.com


From: users 

 on behalf of Patrick Begou via users 

Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.

So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any
success.

Is there a known workaround for this as requesting the administrators to
change the nodes names on this large cluster may be difficult.

Thanks

Patrick








[OMPI users] Intercommunicator issue (any standard about communicator?)

2022-06-24 Thread Guillaume De Nayer via users

Hi,

I am new on this list. Let me introduce myself shortly: I am a 
researcher in fluid mechanics. In this context I am using softwares 
related on MPI.


I am facing a problem:
- 3 programs forms a computational framework. Soft1 is a coupling 
program, i.e., it opens an MPI port at the beginning. Soft2 and Soft3 
are clients, which connect to the coupling program using MPI_Comm_connect.
- After the start and the connections of Soft2 and Soft3 with Soft1, it 
hangs.


I started to debug this issue and as usual I found another issue (or 
perhaps it is not an issue):
- The intercommunicators I get between Soft1-Soft2 and Soft1-Soft3 are 
all negative (running on CentOS 7 with infiniband Mellanox OFED driver).
- Is there some standard about communicator? I don't find anything about 
this topic.

- What is a valid communicator, intercommunicator?

thx a lot
Regards
Guillaume