Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Jeff Squyres (jsquyres) via users
Ah; this is a slightly different error than what Gilles was guessing from your 
prior description.  This is what you're running in to: 
https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134

Try running with:

mpirun --mca regex naive ...

Specifically: the "fwd" regex component is selected by default, but it has 
certain expectations about the format of hostnames.  Try using the "naive" 
regex component, instead.

-- 
Jeff Squyres
jsquy...@cisco.com


From: Patrick Begou 
Sent: Thursday, June 16, 2022 9:48 AM
To: Jeff Squyres (jsquyres); Open MPI Users
Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi  Gilles and Jeff,

@Gilles I will have a look at these files, thanks.

@Jeff this is the error message (screen dump attached) and of course the nodes 
names do not agree with the standard.

Patrick

[cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr]

Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :

What exactly is the error that is occurring?

--
Jeff Squyres
jsquy...@cisco.com


From: users 
 on 
behalf of Patrick Begou via users 

Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.

So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any
success.

Is there a known workaround for this as requesting the administrators to
change the nodes names on this large cluster may be difficult.

Thanks

Patrick






Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread George Bosilca via users
This error seems to be initiated from the PMIX regex framework. Not sure
exactly which one is used, but a good starting point is in one of the files
in 3rd-party/openpmix/src/mca/preg/. Look for the generate_node_regex
function in the different components, one of them is raising the error.

George.


On Thu, Jun 16, 2022 at 9:50 AM Patrick Begou via users <
users@lists.open-mpi.org> wrote:

> Hi  Gilles and Jeff,
>
> @Gilles I will have a look at these files, thanks.
>
> @Jeff this is the error message (screen dump attached) and of course the
> nodes names do not agree with the standard.
>
> Patrick
>
>
>
> Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :
>
> What exactly is the error that is occurring?
>
> --
> Jeff squyresjsquy...@cisco.com
>
> 
> From: users  
>  on behalf of Patrick Begou via users 
>  
> Sent: Thursday, June 16, 2022 3:21 AM
> To: Open MPI Users
> Cc: Patrick Begou
> Subject: [OMPI users] OpenMPI and names of the nodes in a cluster
>
> Hi all,
>
> we are facing a serious problem with OpenMPI (4.0.2) that we have
> deployed on a cluster. We do not manage this large cluster and the names
> of the nodes do not agree with Internet standards for protocols: they
> contain a "_" (underscore) character.
>
> So OpenMPI complains about this and do not run.
>
> I've tried to use IP instead of host names in the host file without any
> success.
>
> Is there a known workaround for this as requesting the administrators to
> change the nodes names on this large cluster may be difficult.
>
> Thanks
>
> Patrick
>
>
>
>
>


Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Patrick Begou via users

Hi  Gilles and Jeff,

@Gilles I will have a look at these files, thanks.

@Jeff this is the error message (screen dump attached) and of course the 
nodes names do not agree with the standard.


Patrick



Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit :

What exactly is the error that is occurring?

--
Jeff Squyres
jsquy...@cisco.com


From: users  on behalf of Patrick Begou via 
users
Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.

So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any
success.

Is there a known workaround for this as requesting the administrators to
change the nodes names on this large cluster may be difficult.

Thanks

Patrick




Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Jeff Squyres (jsquyres) via users
What exactly is the error that is occurring?

--
Jeff Squyres
jsquy...@cisco.com


From: users  on behalf of Patrick Begou via 
users 
Sent: Thursday, June 16, 2022 3:21 AM
To: Open MPI Users
Cc: Patrick Begou
Subject: [OMPI users] OpenMPI and names of the nodes in a cluster

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have
deployed on a cluster. We do not manage this large cluster and the names
of the nodes do not agree with Internet standards for protocols: they
contain a "_" (underscore) character.

So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any
success.

Is there a known workaround for this as requesting the administrators to
change the nodes names on this large cluster may be difficult.

Thanks

Patrick




Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Gilles Gouaillardet via users
Patrick,

you will likely also need to apply the same hack to opal_net_get_hostname()
in opal/util/net.c


Cheers,

Gilles

On Thu, Jun 16, 2022 at 7:30 PM Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> Patrick,
>
> I am not sure Open MPI can do that out of the box.
>
> Maybe hacking pmix_net_get_hostname() in
> opal/mca/pmix/pmix3x/pmix/src/util/net.c
>
> can do the trick.
>
>
> Cheers,
>
> Gilles
>
> On Thu, Jun 16, 2022 at 4:24 PM Patrick Begou via users <
> users@lists.open-mpi.org> wrote:
>
>> Hi all,
>>
>> we are facing a serious problem with OpenMPI (4.0.2) that we have
>> deployed on a cluster. We do not manage this large cluster and the names
>> of the nodes do not agree with Internet standards for protocols: they
>> contain a "_" (underscore) character.
>>
>> So OpenMPI complains about this and do not run.
>>
>> I've tried to use IP instead of host names in the host file without any
>> success.
>>
>> Is there a known workaround for this as requesting the administrators to
>> change the nodes names on this large cluster may be difficult.
>>
>> Thanks
>>
>> Patrick
>>
>>
>>


Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Gilles Gouaillardet via users
Patrick,

I am not sure Open MPI can do that out of the box.

Maybe hacking pmix_net_get_hostname() in
opal/mca/pmix/pmix3x/pmix/src/util/net.c

can do the trick.


Cheers,

Gilles

On Thu, Jun 16, 2022 at 4:24 PM Patrick Begou via users <
users@lists.open-mpi.org> wrote:

> Hi all,
>
> we are facing a serious problem with OpenMPI (4.0.2) that we have
> deployed on a cluster. We do not manage this large cluster and the names
> of the nodes do not agree with Internet standards for protocols: they
> contain a "_" (underscore) character.
>
> So OpenMPI complains about this and do not run.
>
> I've tried to use IP instead of host names in the host file without any
> success.
>
> Is there a known workaround for this as requesting the administrators to
> change the nodes names on this large cluster may be difficult.
>
> Thanks
>
> Patrick
>
>
>


[OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread Patrick Begou via users

Hi all,

we are facing a serious problem with OpenMPI (4.0.2) that we have 
deployed on a cluster. We do not manage this large cluster and the names 
of the nodes do not agree with Internet standards for protocols: they 
contain a "_" (underscore) character.


So OpenMPI complains about this and do not run.

I've tried to use IP instead of host names in the host file without any 
success.


Is there a known workaround for this as requesting the administrators to 
change the nodes names on this large cluster may be difficult.


Thanks

Patrick