Re: [OMPI users] OpenMPI and names of the nodes in a cluster
Sorry if I did not make my intent clear. I was basically suggesting to hack the Open MPI and PMIx wrappers to hostname() and remove the problematic underscores to make the regx components a happy panda again. Cheers, Gilles - Original Message - > I think the files suggested by Gilles are more about the underlying call to get the hostname; those won't be problematic. > > The regex Open MPI modules are where Open MPI is running into a problem with your hostnames (i.e., your hostnames don't fit into Open MPI's expectations of the format of the hostname). I'm surprised that using the naive module (instead of the fwd module) doesn't solve your problem. ...oh shoot, I see why. It's because I had a typo in what I suggested to you. > > Please try: mpirun --mca regx naive ... > > (i.e., "regx", not "regex") > > -- > Jeff Squyres > jsquy...@cisco.com > > > From: Patrick Begou > Sent: Tuesday, June 21, 2022 12:10 PM > To: Jeff Squyres (jsquyres); Open MPI Users > Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster > > Hi Jeff, > > Unfortunately the workaround with "--mca regex naive" does not change the behaviour. I'm going to investigate OpenMPI sources files as suggested by Gilles. > > Patrick > > Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit : > > Ah; this is a slightly different error than what Gilles was guessing from your prior description. This is what you're running in to: https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134 > > Try running with: > > mpirun --mca regex naive ... > > Specifically: the "fwd" regex component is selected by default, but it has certain expectations about the format of hostnames. Try using the " naive" regex component, instead. > > -- > Jeff Squyres > jsquy...@cisco.com<mailto:jsquy...@cisco.com> > > ____ > From: Patrick Begou > Sent: Thursday, June 16, 2022 9:48 AM > To: Jeff Squyres (jsquyres); Open MPI Users > Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster > > Hi Gilles and Jeff, > > @Gilles I will have a look at these files, thanks. > > @Jeff this is the error message (screen dump attached) and of course the nodes names do not agree with the standard. > > Patrick > > [cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr] > > Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit : > > What exactly is the error that is occurring? > > -- > Jeff Squyres > jsquy...@cisco.com<mailto:jsquy...@cisco.com><mailto:jsquyres@cisco. com><mailto:jsquy...@cisco.com> > > > From: users <mailto:users-bounces@ lists.open-mpi.org><mailto:users-boun...@lists.open-mpi.org> on behalf of Patrick Begou via users < users@lists.open-mpi.org><mailto:users@lists.open-mpi.org><mailto:users@ lists.open-mpi.org><mailto:users@lists.open-mpi.org> > Sent: Thursday, June 16, 2022 3:21 AM > To: Open MPI Users > Cc: Patrick Begou > Subject: [OMPI users] OpenMPI and names of the nodes in a cluster > > Hi all, > > we are facing a serious problem with OpenMPI (4.0.2) that we have > deployed on a cluster. We do not manage this large cluster and the names > of the nodes do not agree with Internet standards for protocols: they > contain a "_" (underscore) character. > > So OpenMPI complains about this and do not run. > > I've tried to use IP instead of host names in the host file without any > success. > > Is there a known workaround for this as requesting the administrators to > change the nodes names on this large cluster may be difficult. > > Thanks > > Patrick > > > > > > >
Re: [OMPI users] OpenMPI and names of the nodes in a cluster
I think the files suggested by Gilles are more about the underlying call to get the hostname; those won't be problematic. The regex Open MPI modules are where Open MPI is running into a problem with your hostnames (i.e., your hostnames don't fit into Open MPI's expectations of the format of the hostname). I'm surprised that using the naive module (instead of the fwd module) doesn't solve your problem. ...oh shoot, I see why. It's because I had a typo in what I suggested to you. Please try: mpirun --mca regx naive ... (i.e., "regx", not "regex") -- Jeff Squyres jsquy...@cisco.com From: Patrick Begou Sent: Tuesday, June 21, 2022 12:10 PM To: Jeff Squyres (jsquyres); Open MPI Users Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster Hi Jeff, Unfortunately the workaround with "--mca regex naive" does not change the behaviour. I'm going to investigate OpenMPI sources files as suggested by Gilles. Patrick Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit : Ah; this is a slightly different error than what Gilles was guessing from your prior description. This is what you're running in to: https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134 Try running with: mpirun --mca regex naive ... Specifically: the "fwd" regex component is selected by default, but it has certain expectations about the format of hostnames. Try using the "naive" regex component, instead. -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> From: Patrick Begou <mailto:patrick.be...@univ-grenoble-alpes.fr> Sent: Thursday, June 16, 2022 9:48 AM To: Jeff Squyres (jsquyres); Open MPI Users Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster Hi Gilles and Jeff, @Gilles I will have a look at these files, thanks. @Jeff this is the error message (screen dump attached) and of course the nodes names do not agree with the standard. Patrick [cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr] Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit : What exactly is the error that is occurring? -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com><mailto:jsquy...@cisco.com><mailto:jsquy...@cisco.com> From: users <mailto:users-boun...@lists.open-mpi.org><mailto:users-boun...@lists.open-mpi.org><mailto:users-boun...@lists.open-mpi.org> on behalf of Patrick Begou via users <mailto:users@lists.open-mpi.org><mailto:users@lists.open-mpi.org><mailto:users@lists.open-mpi.org> Sent: Thursday, June 16, 2022 3:21 AM To: Open MPI Users Cc: Patrick Begou Subject: [OMPI users] OpenMPI and names of the nodes in a cluster Hi all, we are facing a serious problem with OpenMPI (4.0.2) that we have deployed on a cluster. We do not manage this large cluster and the names of the nodes do not agree with Internet standards for protocols: they contain a "_" (underscore) character. So OpenMPI complains about this and do not run. I've tried to use IP instead of host names in the host file without any success. Is there a known workaround for this as requesting the administrators to change the nodes names on this large cluster may be difficult. Thanks Patrick
Re: [OMPI users] OpenMPI and names of the nodes in a cluster
Hi Jeff, Unfortunately the workaround with "|--mca regex naive" does not change the behaviour. I'm going to investigate OpenMPI sources files as suggested by Gilles.| | | |Patrick | Le 16/06/2022 à 17:43, Jeff Squyres (jsquyres) a écrit : Ah; this is a slightly different error than what Gilles was guessing from your prior description. This is what you're running in to:https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134 Try running with: mpirun --mca regex naive ... Specifically: the "fwd" regex component is selected by default, but it has certain expectations about the format of hostnames. Try using the "naive" regex component, instead. -- Jeff Squyres jsquy...@cisco.com From: Patrick Begou Sent: Thursday, June 16, 2022 9:48 AM To: Jeff Squyres (jsquyres); Open MPI Users Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster Hi Gilles and Jeff, @Gilles I will have a look at these files, thanks. @Jeff this is the error message (screen dump attached) and of course the nodes names do not agree with the standard. Patrick [cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr] Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit : What exactly is the error that is occurring? -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> From: users<mailto:users-boun...@lists.open-mpi.org> on behalf of Patrick Begou via users<mailto:users@lists.open-mpi.org> Sent: Thursday, June 16, 2022 3:21 AM To: Open MPI Users Cc: Patrick Begou Subject: [OMPI users] OpenMPI and names of the nodes in a cluster Hi all, we are facing a serious problem with OpenMPI (4.0.2) that we have deployed on a cluster. We do not manage this large cluster and the names of the nodes do not agree with Internet standards for protocols: they contain a "_" (underscore) character. So OpenMPI complains about this and do not run. I've tried to use IP instead of host names in the host file without any success. Is there a known workaround for this as requesting the administrators to change the nodes names on this large cluster may be difficult. Thanks Patrick
Re: [OMPI users] OpenMPI and names of the nodes in a cluster
Ah; this is a slightly different error than what Gilles was guessing from your prior description. This is what you're running in to: https://github.com/open-mpi/ompi/blob/v4.0.x/orte/mca/regx/fwd/regx_fwd.c#L130-L134 Try running with: mpirun --mca regex naive ... Specifically: the "fwd" regex component is selected by default, but it has certain expectations about the format of hostnames. Try using the "naive" regex component, instead. -- Jeff Squyres jsquy...@cisco.com From: Patrick Begou Sent: Thursday, June 16, 2022 9:48 AM To: Jeff Squyres (jsquyres); Open MPI Users Subject: Re: [OMPI users] OpenMPI and names of the nodes in a cluster Hi Gilles and Jeff, @Gilles I will have a look at these files, thanks. @Jeff this is the error message (screen dump attached) and of course the nodes names do not agree with the standard. Patrick [cid:part1.KfzAgK4Q.PG6VadQJ@univ-grenoble-alpes.fr] Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit : What exactly is the error that is occurring? -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com> From: users <mailto:users-boun...@lists.open-mpi.org> on behalf of Patrick Begou via users <mailto:users@lists.open-mpi.org> Sent: Thursday, June 16, 2022 3:21 AM To: Open MPI Users Cc: Patrick Begou Subject: [OMPI users] OpenMPI and names of the nodes in a cluster Hi all, we are facing a serious problem with OpenMPI (4.0.2) that we have deployed on a cluster. We do not manage this large cluster and the names of the nodes do not agree with Internet standards for protocols: they contain a "_" (underscore) character. So OpenMPI complains about this and do not run. I've tried to use IP instead of host names in the host file without any success. Is there a known workaround for this as requesting the administrators to change the nodes names on this large cluster may be difficult. Thanks Patrick
Re: [OMPI users] OpenMPI and names of the nodes in a cluster
This error seems to be initiated from the PMIX regex framework. Not sure exactly which one is used, but a good starting point is in one of the files in 3rd-party/openpmix/src/mca/preg/. Look for the generate_node_regex function in the different components, one of them is raising the error. George. On Thu, Jun 16, 2022 at 9:50 AM Patrick Begou via users < users@lists.open-mpi.org> wrote: > Hi Gilles and Jeff, > > @Gilles I will have a look at these files, thanks. > > @Jeff this is the error message (screen dump attached) and of course the > nodes names do not agree with the standard. > > Patrick > > > > Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit : > > What exactly is the error that is occurring? > > -- > Jeff squyresjsquy...@cisco.com > > > From: users > on behalf of Patrick Begou via users > > Sent: Thursday, June 16, 2022 3:21 AM > To: Open MPI Users > Cc: Patrick Begou > Subject: [OMPI users] OpenMPI and names of the nodes in a cluster > > Hi all, > > we are facing a serious problem with OpenMPI (4.0.2) that we have > deployed on a cluster. We do not manage this large cluster and the names > of the nodes do not agree with Internet standards for protocols: they > contain a "_" (underscore) character. > > So OpenMPI complains about this and do not run. > > I've tried to use IP instead of host names in the host file without any > success. > > Is there a known workaround for this as requesting the administrators to > change the nodes names on this large cluster may be difficult. > > Thanks > > Patrick > > > > >
Re: [OMPI users] OpenMPI and names of the nodes in a cluster
Hi Gilles and Jeff, @Gilles I will have a look at these files, thanks. @Jeff this is the error message (screen dump attached) and of course the nodes names do not agree with the standard. Patrick Le 16/06/2022 à 14:30, Jeff Squyres (jsquyres) a écrit : What exactly is the error that is occurring? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Patrick Begou via users Sent: Thursday, June 16, 2022 3:21 AM To: Open MPI Users Cc: Patrick Begou Subject: [OMPI users] OpenMPI and names of the nodes in a cluster Hi all, we are facing a serious problem with OpenMPI (4.0.2) that we have deployed on a cluster. We do not manage this large cluster and the names of the nodes do not agree with Internet standards for protocols: they contain a "_" (underscore) character. So OpenMPI complains about this and do not run. I've tried to use IP instead of host names in the host file without any success. Is there a known workaround for this as requesting the administrators to change the nodes names on this large cluster may be difficult. Thanks Patrick
Re: [OMPI users] OpenMPI and names of the nodes in a cluster
What exactly is the error that is occurring? -- Jeff Squyres jsquy...@cisco.com From: users on behalf of Patrick Begou via users Sent: Thursday, June 16, 2022 3:21 AM To: Open MPI Users Cc: Patrick Begou Subject: [OMPI users] OpenMPI and names of the nodes in a cluster Hi all, we are facing a serious problem with OpenMPI (4.0.2) that we have deployed on a cluster. We do not manage this large cluster and the names of the nodes do not agree with Internet standards for protocols: they contain a "_" (underscore) character. So OpenMPI complains about this and do not run. I've tried to use IP instead of host names in the host file without any success. Is there a known workaround for this as requesting the administrators to change the nodes names on this large cluster may be difficult. Thanks Patrick
Re: [OMPI users] OpenMPI and names of the nodes in a cluster
Patrick, you will likely also need to apply the same hack to opal_net_get_hostname() in opal/util/net.c Cheers, Gilles On Thu, Jun 16, 2022 at 7:30 PM Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Patrick, > > I am not sure Open MPI can do that out of the box. > > Maybe hacking pmix_net_get_hostname() in > opal/mca/pmix/pmix3x/pmix/src/util/net.c > > can do the trick. > > > Cheers, > > Gilles > > On Thu, Jun 16, 2022 at 4:24 PM Patrick Begou via users < > users@lists.open-mpi.org> wrote: > >> Hi all, >> >> we are facing a serious problem with OpenMPI (4.0.2) that we have >> deployed on a cluster. We do not manage this large cluster and the names >> of the nodes do not agree with Internet standards for protocols: they >> contain a "_" (underscore) character. >> >> So OpenMPI complains about this and do not run. >> >> I've tried to use IP instead of host names in the host file without any >> success. >> >> Is there a known workaround for this as requesting the administrators to >> change the nodes names on this large cluster may be difficult. >> >> Thanks >> >> Patrick >> >> >>
Re: [OMPI users] OpenMPI and names of the nodes in a cluster
Patrick, I am not sure Open MPI can do that out of the box. Maybe hacking pmix_net_get_hostname() in opal/mca/pmix/pmix3x/pmix/src/util/net.c can do the trick. Cheers, Gilles On Thu, Jun 16, 2022 at 4:24 PM Patrick Begou via users < users@lists.open-mpi.org> wrote: > Hi all, > > we are facing a serious problem with OpenMPI (4.0.2) that we have > deployed on a cluster. We do not manage this large cluster and the names > of the nodes do not agree with Internet standards for protocols: they > contain a "_" (underscore) character. > > So OpenMPI complains about this and do not run. > > I've tried to use IP instead of host names in the host file without any > success. > > Is there a known workaround for this as requesting the administrators to > change the nodes names on this large cluster may be difficult. > > Thanks > > Patrick > > >
[OMPI users] OpenMPI and names of the nodes in a cluster
Hi all, we are facing a serious problem with OpenMPI (4.0.2) that we have deployed on a cluster. We do not manage this large cluster and the names of the nodes do not agree with Internet standards for protocols: they contain a "_" (underscore) character. So OpenMPI complains about this and do not run. I've tried to use IP instead of host names in the host file without any success. Is there a known workaround for this as requesting the administrators to change the nodes names on this large cluster may be difficult. Thanks Patrick