Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-24 Thread Gilles Gouaillardet via users
Sorry if I did not make my intent clear.

I was basically suggesting to hack the Open MPI and PMIx wrappers to 
hostname() and remove the problematic underscores to make the regx 
components a happy panda again.

Cheers,

Gilles

- Original Message -
> I think the files suggested by Gilles are more about the underlying 
call to get the hostname; those won't be problematic.
> 
> The regex Open MPI modules are where Open MPI is running into a 
problem with your hostnames (i.e., your hostnames don't fit into Open 
MPI's expectations of the format of the hostname).  I'm surprised that 
using the naive module (instead of the fwd module) doesn't solve your 
problem.  ...oh shoot, I see why.  It's because I had a typo in what I 
suggested to you.
> 
> Please try:  mpirun --mca regx naive ...
> 
> (i.e., "regx", not "regex")
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> 
> From: Patrick Begou 
> Sent: Tuesday, June 21, 2022 12:10 PM
> To: Jeff Squyres (jsquyres); Open MPI Users
> Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster
> 
> Hi Jeff,
> 
> Unfortunately the workaround with "--mca regex naive" does not change 
the behaviour. I'm going to investigate OpenMPI sources files as 
suggested by Gilles.
> 
> Patrick
> 
> Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit :
> 
> Ah; this is a slightly different error than what Gilles was guessing 
from your prior description.  This is what you're running in to: 
https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134

> 
> Try running with:
> 
> mpirun --mca regex naive ...
> 
> Specifically: the "fwd" regex component is selected by default, but it 
has certain expectations about the format of hostnames.  Try using the "
naive" regex component, instead.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> 
> From: Patrick Begou 
> Sent: Thursday, June 16, 2022 9:48 AM
> To: Jeff Squyres (jsquyres); Open MPI Users
> Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster
> 
> Hi  Gilles and Jeff,
> 
> @Gilles I will have a look at these files, thanks.
> 
> @Jeff this is the error message (screen dump attached) and of course 
the nodes names do not agree with the standard.
> 
> Patrick
> 
> [cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr]
> 
> Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :
> 
> What exactly is the error that is occurring?
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> 
> 
> From: users  on behalf of Patrick Begou via users <
users@lists.open-mpi.org>
> Sent: Thursday, June 16, 2022 3:21 AM
> To: Open MPI Users
> Cc: Patrick Begou
> Subject: [OMPI users] OpenMPI and names of the nodes in a cluster
> 
> Hi all,
> 
> we are facing a serious problem with OpenMPI (4.0.2) that we have
> deployed on a cluster. We do not manage this large cluster and the 
names
> of the nodes do not agree with Internet standards for protocols: they
> contain a "_" (underscore) character.
> 
> So OpenMPI complains about this and do not run.
> 
> I've tried to use IP instead of host names in the host file without 
any
> success.
> 
> Is there a known workaround for this as requesting the administrators 
to
> change the nodes names on this large cluster may be difficult.
> 
> Thanks
> 
> Patrick
> 
> 
> 
> 
> 
> 
> 


Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-24 Thread Jeff Squyres (jsquyres) via users
I think the files suggested by Gilles are more about the underlying call to get 
the hostname; those won't be problematic.

The regex Open MPI modules are where Open MPI is running into a problem with 
your hostnames (i.e., your hostnames don't fit into Open MPI's expectations of 
the format of the hostname).  I'm surprised that using the naive module 
(instead of the fwd module) doesn't solve your problem.  ...oh shoot, I see 
why.  It's because I had a typo in what I suggested to you.

Please try:  mpirun --mca regx naive ...

(i.e., "regx", not "regex")

--
Jeff Squyres
jsquy...@cisco.com


From: Patrick Begou 
Sent: Tuesday, June 21, 2022 12:10 PM
To: Jeff Squyres (jsquyres); Open MPI Users
Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi Jeff,

Unfortunately the workaround with "--mca regex naive" does not change the 
behaviour. I'm going to investigate OpenMPI sources files as suggested by 
Gilles.

Patrick

Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit :

Ah; this is a slightly different error than what Gilles was guessing from your 
prior description.  This is what you're running in to: 
https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134

Try running with:

mpirun --mca regex naive ...

Specifically: the "fwd" regex component is selected by default, but it has 
certain expectations about the format of hostnames.  Try using the "naive" 
regex component, instead.

--
Jeff Squyres
jsquy...@cisco.com


From: Patrick Begou 

Sent: Thursday, June 16, 2022 9:48 AM
To: Jeff Squyres (jsquyres); Open MPI Users
Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi  Gilles and Jeff,

@Gilles I will have a look at these files, thanks.

@Jeff this is the error message (screen dump attached) and of course the nodes 
names do not agree with the standard.

Patrick

[cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr]

Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :

What exactly is the error that is occurring?

--
Jeff Squyres
jsquy...@cisco.com


From: users 

 on behalf of Patrick Begou via users 

Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.

So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any
success.

Is there a known workaround for this as requesting the administrators to
change the nodes names on this large cluster may be difficult.

Thanks

Patrick








Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-21 Thread Patrick Begou via users

Hi Jeff,

Unfortunately the workaround with "|--mca regex naive" does not change 
the behaviour. I'm going to investigate OpenMPI sources files as 
suggested by Gilles.|

|
|
|Patrick
|

Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit :

Ah; this is a slightly different error than what Gilles was guessing from your 
prior description.  This is what you're running in 
to:https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134

Try running with:

mpirun --mca regex naive ...

Specifically: the "fwd" regex component is selected by default, but it has certain 
expectations about the format of hostnames.  Try using the "naive" regex component, 
instead.

--
Jeff Squyres
jsquy...@cisco.com


From: Patrick Begou
Sent: Thursday, June 16, 2022 9:48 AM
To: Jeff Squyres (jsquyres); Open MPI Users
Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi  Gilles and Jeff,

@Gilles I will have a look at these files, thanks.

@Jeff this is the error message (screen dump attached) and of course the nodes 
names do not agree with the standard.

Patrick

[cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr]

Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :

What exactly is the error that is occurring?

--
Jeff Squyres
jsquy...@cisco.com


From: users  on 
behalf of Patrick Begou via users
Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.

So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any
success.

Is there a known workaround for this as requesting the administrators to
change the nodes names on this large cluster may be difficult.

Thanks

Patrick






Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Jeff Squyres (jsquyres) via users
Ah; this is a slightly different error than what Gilles was guessing from your 
prior description.  This is what you're running in to: 
https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134

Try running with:

mpirun --mca regex naive ...

Specifically: the "fwd" regex component is selected by default, but it has 
certain expectations about the format of hostnames.  Try using the "naive" 
regex component, instead.

-- 
Jeff Squyres
jsquy...@cisco.com


From: Patrick Begou 
Sent: Thursday, June 16, 2022 9:48 AM
To: Jeff Squyres (jsquyres); Open MPI Users
Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi  Gilles and Jeff,

@Gilles I will have a look at these files, thanks.

@Jeff this is the error message (screen dump attached) and of course the nodes 
names do not agree with the standard.

Patrick

[cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr]

Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :

What exactly is the error that is occurring?

--
Jeff Squyres
jsquy...@cisco.com


From: users 
 on 
behalf of Patrick Begou via users 

Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.

So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any
success.

Is there a known workaround for this as requesting the administrators to
change the nodes names on this large cluster may be difficult.

Thanks

Patrick






Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread George Bosilca via users
This error seems to be initiated from the PMIX regex framework. Not sure
exactly which one is used, but a good starting point is in one of the files
in 3rd-party/openpmix/src/mca/preg/. Look for the generate_node_regex
function in the different components, one of them is raising the error.

George.


On Thu, Jun 16, 2022 at 9:50 AM Patrick Begou via users <
users@lists.open-mpi.org> wrote:

> Hi  Gilles and Jeff,
>
> @Gilles I will have a look at these files, thanks.
>
> @Jeff this is the error message (screen dump attached) and of course the
> nodes names do not agree with the standard.
>
> Patrick
>
>
>
> Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :
>
> What exactly is the error that is occurring?
>
> --
> Jeff squyresjsquy...@cisco.com
>
> 
> From: users  
>  on behalf of Patrick Begou via users 
>  
> Sent: Thursday, June 16, 2022 3:21 AM
> To: Open MPI Users
> Cc: Patrick Begou
> Subject: [OMPI users] OpenMPI and names of the nodes in a cluster
>
> Hi all,
>
> we are facing a serious problem with OpenMPI (4.0.2) that we have
> deployed on a cluster. We do not manage this large cluster and the names
> of the nodes do not agree with Internet standards for protocols: they
> contain a "_" (underscore) character.
>
> So OpenMPI complains about this and do not run.
>
> I've tried to use IP instead of host names in the host file without any
> success.
>
> Is there a known workaround for this as requesting the administrators to
> change the nodes names on this large cluster may be difficult.
>
> Thanks
>
> Patrick
>
>
>
>
>


Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Patrick Begou via users

Hi  Gilles and Jeff,

@Gilles I will have a look at these files, thanks.

@Jeff this is the error message (screen dump attached) and of course the 
nodes names do not agree with the standard.


Patrick



Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :

What exactly is the error that is occurring?

--
Jeff Squyres
jsquy...@cisco.com


From: users  on behalf of Patrick Begou via 
users
Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.

So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any
success.

Is there a known workaround for this as requesting the administrators to
change the nodes names on this large cluster may be difficult.

Thanks

Patrick




Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Jeff Squyres (jsquyres) via users
What exactly is the error that is occurring?

--
Jeff Squyres
jsquy...@cisco.com


From: users  on behalf of Patrick Begou via 
users 
Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.

So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any
success.

Is there a known workaround for this as requesting the administrators to
change the nodes names on this large cluster may be difficult.

Thanks

Patrick




Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Gilles Gouaillardet via users
Patrick,

you will likely also need to apply the same hack to opal_net_get_hostname()
in opal/util/net.c


Cheers,

Gilles

On Thu, Jun 16, 2022 at 7:30 PM Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Patrick,
>
> I am not sure Open MPI can do that out of the box.
>
> Maybe hacking pmix_net_get_hostname() in
> opal/mca/pmix/pmix3x/pmix/src/util/net.c
>
> can do the trick.
>
>
> Cheers,
>
> Gilles
>
> On Thu, Jun 16, 2022 at 4:24 PM Patrick Begou via users <
> users@lists.open-mpi.org> wrote:
>
>> Hi all,
>>
>> we are facing a serious problem with OpenMPI (4.0.2) that we have
>> deployed on a cluster. We do not manage this large cluster and the names
>> of the nodes do not agree with Internet standards for protocols: they
>> contain a "_" (underscore) character.
>>
>> So OpenMPI complains about this and do not run.
>>
>> I've tried to use IP instead of host names in the host file without any
>> success.
>>
>> Is there a known workaround for this as requesting the administrators to
>> change the nodes names on this large cluster may be difficult.
>>
>> Thanks
>>
>> Patrick
>>
>>
>>


Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Gilles Gouaillardet via users
Patrick,

I am not sure Open MPI can do that out of the box.

Maybe hacking pmix_net_get_hostname() in
opal/mca/pmix/pmix3x/pmix/src/util/net.c

can do the trick.


Cheers,

Gilles

On Thu, Jun 16, 2022 at 4:24 PM Patrick Begou via users <
users@lists.open-mpi.org> wrote:

> Hi all,
>
> we are facing a serious problem with OpenMPI (4.0.2) that we have
> deployed on a cluster. We do not manage this large cluster and the names
> of the nodes do not agree with Internet standards for protocols: they
> contain a "_" (underscore) character.
>
> So OpenMPI complains about this and do not run.
>
> I've tried to use IP instead of host names in the host file without any
> success.
>
> Is there a known workaround for this as requesting the administrators to
> change the nodes names on this large cluster may be difficult.
>
> Thanks
>
> Patrick
>
>
>


[OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Patrick Begou via users

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have 
deployed on a cluster. We do not manage this large cluster and the names 
of the nodes do not agree with Internet standards for protocols: they 
contain a "_" (underscore) character.


So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any 
success.


Is there a known workaround for this as requesting the administrators to 
change the nodes names on this large cluster may be difficult.


Thanks

Patrick