Re: [OMPI users] How are the Open MPI processes spawned?
I'll take a look at having the rsh launcher forward MCA params up to the cmd line limit, and warn if there are too many to fit. Shouldn't be too hard, I would think. On Dec 6, 2011, at 1:28 PM, Paul Kapinos wrote: > Hello Jeff, Ralph, all! > Meaning that per my output from above, what Paul was trying should have worked, no? I.e., setenv'ing OMPI_, and those env vars should magically show up in the launched process. >>> In the -launched process- yes. However, his problem was that they do not >>> show up for the -orteds-, and thus the orteds don't wireup correctly. > > Sorry for latency, too many issues on too many area needing improvement :-/ > Well, just to clarify the long story about what I have seen: > > 1. got a strange start-up problem (based on bogus configuration of eth0 + > known (for you, experts :o) bug in /1.5.x > > 2. got a workaround for (1.) by setting '-mca oob_tcp_if_include ib0 -mca > btl_tcp_if_include ib0' on the command line of mpiexec => WORKS! Many thanks > guys! > > 3. remembered that any MCA Parameters can also be defined over OMP_MCA_... > envvars, tried out to set => works NOT, the hang-ups were again here. > Checking how the MCA parameters are set by ompi_info - all clear, but doesn't > work. My blind guess was, mpiexec does not understood there envvars in this > case. > See also http://www.open-mpi.org/community/lists/users/2011/11/17823.php > > Thus this issue is not about forwarding some or any OMPI_* envvars to the > _processes_, but on someone step _before_ (the processes were not started > correctly at all in my problem case), as Ralph wrote. > > The difference in behaviour if setting parameters on command line or over > OMPI_*envvars matters! > > > Ralph Castain wrote: > >> Did you filed it, or someone else, or should I do it in some way? > > I'll take care of it, and copy you on the ticket so you can see > > what happens. I'll also do the same for the connection bug > > - sorry for the problem :-( > > Ralph, many thanks for this! > > Best wishes and a nice evening/day/whatever time you have! > > Paul Kapinos > > > > > -- > Dipl.-Inform. Paul Kapinos - High Performance Computing, > RWTH Aachen University, Center for Computing and Communication > Seffenter Weg 23, D 52074 Aachen (Germany) > Tel: +49 241/80-24915 > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How are the Open MPI processes spawned?
Hello Jeff, Ralph, all! Meaning that per my output from above, what Paul was trying should have worked, no? I.e., setenv'ing OMPI_, and those env vars should magically show up in the launched process. In the -launched process- yes. However, his problem was that they do not show up for the -orteds-, and thus the orteds don't wireup correctly. Sorry for latency, too many issues on too many area needing improvement :-/ Well, just to clarify the long story about what I have seen: 1. got a strange start-up problem (based on bogus configuration of eth0 + known (for you, experts :o) bug in /1.5.x 2. got a workaround for (1.) by setting '-mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0' on the command line of mpiexec => WORKS! Many thanks guys! 3. remembered that any MCA Parameters can also be defined over OMP_MCA_... envvars, tried out to set => works NOT, the hang-ups were again here. Checking how the MCA parameters are set by ompi_info - all clear, but doesn't work. My blind guess was, mpiexec does not understood there envvars in this case. See also http://www.open-mpi.org/community/lists/users/2011/11/17823.php Thus this issue is not about forwarding some or any OMPI_* envvars to the _processes_, but on someone step _before_ (the processes were not started correctly at all in my problem case), as Ralph wrote. The difference in behaviour if setting parameters on command line or over OMPI_*envvars matters! Ralph Castain wrote: >> Did you filed it, or someone else, or should I do it in some way? > I'll take care of it, and copy you on the ticket so you can see > what happens. I'll also do the same for the connection bug > - sorry for the problem :-( Ralph, many thanks for this! Best wishes and a nice evening/day/whatever time you have! Paul Kapinos -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 smime.p7s Description: S/MIME Cryptographic Signature
Re: [OMPI users] How are the Open MPI processes spawned?
On Nov 28, 2011, at 7:39 PM, Ralph Castain wrote: >> Meaning that per my output from above, what Paul was trying should have >> worked, no? I.e., setenv'ing OMPI_, and those env vars should >> magically show up in the launched process. > > In the -launched process- yes. However, his problem was that they do not show > up for the -orteds-, and thus the orteds don't wireup correctly. Now I get it. Sorry for the noise. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] How are the Open MPI processes spawned?
On Nov 28, 2011, at 5:32 PM, Jeff Squyres wrote: > On Nov 28, 2011, at 6:56 PM, Ralph Castain wrote: > Right-o. Knew there was something I forgot... > >> So on rsh, we do not put envar mca params onto the orted cmd line. This has >> been noted repeatedly on the user and devel lists, so it really has always >> been the case. > > So they're sent as part of the launch command (i.e., out of band -- not on > the rsh/ssh command line), right? Yes > > Meaning that per my output from above, what Paul was trying should have > worked, no? I.e., setenv'ing OMPI_, and those env vars should > magically show up in the launched process. In the -launched process- yes. However, his problem was that they do not show up for the -orteds-, and thus the orteds don't wireup correctly.
Re: [OMPI users] How are the Open MPI processes spawned?
On Nov 28, 2011, at 6:56 PM, Ralph Castain wrote: > I'm afraid that example is incorrect - you were running under slurm on your > cluster, not rsh. Ummm... right. Duh. > If you look at the actual code, you will see that we slurp up the envars into > the environment of each app_context, and then send that to the backend. Ah, right -- here's running under rsh (SVN trunk): - [16:26] svbu-mpi:~ % cat run #!/bin/csh -f echo on `hostname`, foo is: $OMPI_MCA_foo exit 0 [16:26] svbu-mpi:~ % mpirun -np 2 --host svbu-mpi043,svbu-mpi044 run OMPI_MCA_foo: Undefined variable. OMPI_MCA_foo: Undefined variable. --- While the primary job terminated normally, 2 processes returned non-zero exit codes.. Further examination may be required. --- [16:26] svbu-mpi:~ % setenv OMPI_MCA_foo bar [16:27] svbu-mpi:~ % mpirun -np 2 --host svbu-mpi043,svbu-mpi044 run on svbu-mpi044, foo is: bar on svbu-mpi043, foo is: bar [16:27] svbu-mpi:~ % - (the "MCA_" here is superfluous -- I looked at the code and the man page that Paul cited, and see that we grab all env vars OMPI_*, not OMPI_MCA_*). > In environments like slurm, we can also apply those envars to the launch of > the orteds as we pass the env to the API that launches the orteds. You cannot > do that with rsh, as you know. Right-o. Knew there was something I forgot... > So on rsh, we do not put envar mca params onto the orted cmd line. This has > been noted repeatedly on the user and devel lists, so it really has always > been the case. So they're sent as part of the launch command (i.e., out of band -- not on the rsh/ssh command line), right? Meaning that per my output from above, what Paul was trying should have worked, no? I.e., setenv'ing OMPI_, and those env vars should magically show up in the launched process. [after performing some RTFM...] at least the man page of mpiexec says, the OMPI_ environment variables are always provided and thus treated *differently* than other envvars: $ man mpiexec Exported Environment Variables All environment variables that are named in the form OMPI_* will automatically be exported to new processes on the local and remote nodes. So, tells the man page lies, or this is an removed feature, or something else? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] How are the Open MPI processes spawned?
I'm afraid that example is incorrect - you were running under slurm on your cluster, not rsh. If you look at the actual code, you will see that we slurp up the envars into the environment of each app_context, and then send that to the backend. In environments like slurm, we can also apply those envars to the launch of the orteds as we pass the env to the API that launches the orteds. You cannot do that with rsh, as you know. So on rsh, we do not put envar mca params onto the orted cmd line. This has been noted repeatedly on the user and devel lists, so it really has always been the case. HTH Ralph On Nov 28, 2011, at 3:39 PM, Jeff Squyres wrote: > (off list) > > Are you sure about OMPI_MCA_* params not being treated specially? I know for > a fact that they *used* to be. I.e., we bundled up all env variables that > began with OMPI_MCA_* and sent them with the job to back-end nodes. It > allowed sysadmins to set global MCA param values without editing the MCA > param file on every node. > > It looks like this is still happening on the trunk: > > [14:38] svbu-mpi:~ % cat run > #!/bin/csh -f > > echo on `hostname`, foo is: $OMPI_MCA_foo > exit 0 > [14:38] svbu-mpi:~ % setenv OMPI_MCA_foo bar > [14:38] svbu-mpi:~ % ./run > on svbu-mpi.cisco.com, foo is: bar > [14:38] svbu-mpi:~ % mpirun -np 2 --bynode run > on svbu-mpi044, foo is: bar > on svbu-mpi043, foo is: bar > [14:38] svbu-mpi:~ % unsetenv OMPI_MCA_foo > [14:38] svbu-mpi:~ % mpirun -np 2 --bynode run > OMPI_MCA_foo: Undefined variable. > OMPI_MCA_foo: Undefined variable. > --- > While the primary job terminated normally, 2 processes returned > non-zero exit codes.. Further examination may be required. > --- > [14:38] svbu-mpi:~ % > > (I did not read this thread too carefully, so perhaps I missed an inference > in here somewhere...) > > > > > > On Nov 25, 2011, at 5:21 PM, Ralph Castain wrote: > >> >> On Nov 25, 2011, at 12:29 PM, Paul Kapinos wrote: >> >>> Hello again, >>> > Ralph Castain wrote: >> Yes, that would indeed break things. The 1.5 series isn't correctly >> checking connections across multiple interfaces until it finds one that >> works - it just uses the first one it sees. :-( > Yahhh!! > This behaviour - catch a random interface and hang forever if something > is wrong with it - is somewhat less than perfect. > > From my perspective - the users one - OpenMPI should try to use eitcher > *all* available networks (as 1.4 it does...), starting with the high > performance ones, or *only* those interfaces on which the hostnames from > the hostfile are bound to. It is indeed supposed to do the former - as I implied, this is a bug in the 1.5 series. >>> >>> Thanks for clarification. I was not sure about this is a bug or a feature >>> :-) >>> >>> >>> > Also, there should be timeouts (if you cannot connect to a node within a > minute you probably will never ever be connected...) We have debated about this for some time - there is a timeout mca param one can set, but we'll consider again making it default. > If some connection runs into a timeout a warning would be great (and a > hint to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude). > > Should it not? > Maybe you can file it as a "call for enhancement"... Probably the right approach at this time. >>> >>> Ahhh.. sorry, did not understand what you mean. >>> Did you filed it, or someone else, or should I do it in some way? Or should >>> not? >> >> I'll take care of it, and copy you on the ticket so you can see what happens. >> >> I'll also do the same for the connection bug - sorry for the problem :-( >> >> >>> >>> >>> >>> >>> >>> > But then I ran into yet another one issue. In > http://www.open-mpi.org/faq/?category=tuning#setting-mca-params > the way to define MCA parameters over environment variables is described. > > I tried it: > $ export OMPI_MCA_oob_tcp_if_include=ib0 > $ export OMPI_MCA_btl_tcp_if_include=ib0 > > > I checked it: > $ ompi_info --param all all | grep oob_tcp_if_include > MCA oob: parameter "oob_tcp_if_include" (current value: > , data source: environment or cmdline) > $ ompi_info --param all all | grep btl_tcp_if_include > MCA btl: parameter "btl_tcp_if_include" (current value: > , data source: environment or cmdline) > > > But then I get again the hang-up issue! > > ==> seem, mpiexec does not understand these environment variables! and > only get the command line options. This should not be so? No, that isn't what is happening. The problem lies in the behavior of rsh/ssh. This environment does not forward environmental variables. Because of limits on cmd
Re: [OMPI users] How are the Open MPI processes spawned?
On Nov 28, 2011, at 5:39 PM, Jeff Squyres wrote: > (off list) Hah! So much for me discretely asking off-list before coming back with a definitive answer... :-\ -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] How are the Open MPI processes spawned?
(off list) Are you sure about OMPI_MCA_* params not being treated specially? I know for a fact that they *used* to be. I.e., we bundled up all env variables that began with OMPI_MCA_* and sent them with the job to back-end nodes. It allowed sysadmins to set global MCA param values without editing the MCA param file on every node. It looks like this is still happening on the trunk: [14:38] svbu-mpi:~ % cat run #!/bin/csh -f echo on `hostname`, foo is: $OMPI_MCA_foo exit 0 [14:38] svbu-mpi:~ % setenv OMPI_MCA_foo bar [14:38] svbu-mpi:~ % ./run on svbu-mpi.cisco.com, foo is: bar [14:38] svbu-mpi:~ % mpirun -np 2 --bynode run on svbu-mpi044, foo is: bar on svbu-mpi043, foo is: bar [14:38] svbu-mpi:~ % unsetenv OMPI_MCA_foo [14:38] svbu-mpi:~ % mpirun -np 2 --bynode run OMPI_MCA_foo: Undefined variable. OMPI_MCA_foo: Undefined variable. --- While the primary job terminated normally, 2 processes returned non-zero exit codes.. Further examination may be required. --- [14:38] svbu-mpi:~ % (I did not read this thread too carefully, so perhaps I missed an inference in here somewhere...) On Nov 25, 2011, at 5:21 PM, Ralph Castain wrote: > > On Nov 25, 2011, at 12:29 PM, Paul Kapinos wrote: > >> Hello again, >> Ralph Castain wrote: > Yes, that would indeed break things. The 1.5 series isn't correctly > checking connections across multiple interfaces until it finds one that > works - it just uses the first one it sees. :-( Yahhh!! This behaviour - catch a random interface and hang forever if something is wrong with it - is somewhat less than perfect. From my perspective - the users one - OpenMPI should try to use eitcher *all* available networks (as 1.4 it does...), starting with the high performance ones, or *only* those interfaces on which the hostnames from the hostfile are bound to. >>> It is indeed supposed to do the former - as I implied, this is a bug in the >>> 1.5 series. >> >> Thanks for clarification. I was not sure about this is a bug or a feature :-) >> >> >> Also, there should be timeouts (if you cannot connect to a node within a minute you probably will never ever be connected...) >>> We have debated about this for some time - there is a timeout mca param one >>> can set, but we'll consider again making it default. If some connection runs into a timeout a warning would be great (and a hint to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude). Should it not? Maybe you can file it as a "call for enhancement"... >>> Probably the right approach at this time. >> >> Ahhh.. sorry, did not understand what you mean. >> Did you filed it, or someone else, or should I do it in some way? Or should >> not? > > I'll take care of it, and copy you on the ticket so you can see what happens. > > I'll also do the same for the connection bug - sorry for the problem :-( > > >> >> >> >> >> >> But then I ran into yet another one issue. In http://www.open-mpi.org/faq/?category=tuning#setting-mca-params the way to define MCA parameters over environment variables is described. I tried it: $ export OMPI_MCA_oob_tcp_if_include=ib0 $ export OMPI_MCA_btl_tcp_if_include=ib0 I checked it: $ ompi_info --param all all | grep oob_tcp_if_include MCA oob: parameter "oob_tcp_if_include" (current value: , data source: environment or cmdline) $ ompi_info --param all all | grep btl_tcp_if_include MCA btl: parameter "btl_tcp_if_include" (current value: , data source: environment or cmdline) But then I get again the hang-up issue! ==> seem, mpiexec does not understand these environment variables! and only get the command line options. This should not be so? >>> No, that isn't what is happening. The problem lies in the behavior of >>> rsh/ssh. This environment does not forward environmental variables. Because >>> of limits on cmd line length, we don't automatically forward MCA params >>> from the environment, but only from the cmd line. It is an annoying >>> limitation, but one outside our control. >> >> We know about "ssh does not forward environmental variables." But in this >> case, are these parameters not the parameters of mpiexec itself, too? >> >> The crucial thing is, that setting of the parameters works over the command >> line but *does not work* over the envvar way (as in >> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params described). >> This looks like a bug for me! >> >> >> >> >> >>> Put those envars in the default mca param file and the problem will be >>> resolved. >> >> You mean e.g. $prefix/etc/openmpi-mca-params.conf as described in 4. of >>
Re: [OMPI users] How are the Open MPI processes spawned?
On Nov 25, 2011, at 12:29 PM, Paul Kapinos wrote: > Hello again, > >>> Ralph Castain wrote: Yes, that would indeed break things. The 1.5 series isn't correctly checking connections across multiple interfaces until it finds one that works - it just uses the first one it sees. :-( >>> Yahhh!! >>> This behaviour - catch a random interface and hang forever if something is >>> wrong with it - is somewhat less than perfect. >>> >>> From my perspective - the users one - OpenMPI should try to use eitcher >>> *all* available networks (as 1.4 it does...), starting with the high >>> performance ones, or *only* those interfaces on which the hostnames from >>> the hostfile are bound to. >> It is indeed supposed to do the former - as I implied, this is a bug in the >> 1.5 series. > > Thanks for clarification. I was not sure about this is a bug or a feature :-) > > > >>> Also, there should be timeouts (if you cannot connect to a node within a >>> minute you probably will never ever be connected...) >> We have debated about this for some time - there is a timeout mca param one >> can set, but we'll consider again making it default. >>> If some connection runs into a timeout a warning would be great (and a hint >>> to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude). >>> >>> Should it not? >>> Maybe you can file it as a "call for enhancement"... >> Probably the right approach at this time. > > Ahhh.. sorry, did not understand what you mean. > Did you filed it, or someone else, or should I do it in some way? Or should > not? I'll take care of it, and copy you on the ticket so you can see what happens. I'll also do the same for the connection bug - sorry for the problem :-( > > > > > > >>> But then I ran into yet another one issue. In >>> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params >>> the way to define MCA parameters over environment variables is described. >>> >>> I tried it: >>> $ export OMPI_MCA_oob_tcp_if_include=ib0 >>> $ export OMPI_MCA_btl_tcp_if_include=ib0 >>> >>> >>> I checked it: >>> $ ompi_info --param all all | grep oob_tcp_if_include >>>MCA oob: parameter "oob_tcp_if_include" (current value: >>> , data source: environment or cmdline) >>> $ ompi_info --param all all | grep btl_tcp_if_include >>>MCA btl: parameter "btl_tcp_if_include" (current value: >>> , data source: environment or cmdline) >>> >>> >>> But then I get again the hang-up issue! >>> >>> ==> seem, mpiexec does not understand these environment variables! and only >>> get the command line options. This should not be so? >> No, that isn't what is happening. The problem lies in the behavior of >> rsh/ssh. This environment does not forward environmental variables. Because >> of limits on cmd line length, we don't automatically forward MCA params from >> the environment, but only from the cmd line. It is an annoying limitation, >> but one outside our control. > > We know about "ssh does not forward environmental variables." But in this > case, are these parameters not the parameters of mpiexec itself, too? > > The crucial thing is, that setting of the parameters works over the command > line but *does not work* over the envvar way (as in > http://www.open-mpi.org/faq/?category=tuning#setting-mca-params described). > This looks like a bug for me! > > > > > >> Put those envars in the default mca param file and the problem will be >> resolved. > > You mean e.g. $prefix/etc/openmpi-mca-params.conf as described in 4. of > http://www.open-mpi.org/faq/?category=tuning#setting-mca-params > > Well, this is possible, but not flexible enough for us (because there are > some machines which only can run if the parameters are *not* set - on those > the ssh goes just over these eth0 devices). > > By now we use the command line parameters and hope the envvar way will work > sometimes. > > >>> (I also tried to advise to provide the envvars by -x >>> OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing >>> changed. >> I'm surprised by that - they should be picked up and forwarded. Could be a >> bug > > Well, I also mean this is a bug, but as said not on providing the values of > envvars but on detecting of these parameters at all. Or maybe on both. > > > > >>> Well, they are OMPI_ variables and should be provided in any case). >> No, they aren't - they are not treated differently than any other envar. > > [after performing some RTFM...] > at least the man page of mpiexec says, the OMPI_ environment variables are > always provided and thus treated *differently* than other envvars: > > $ man mpiexec > > Exported Environment Variables > All environment variables that are named in the form OMPI_* will > automatically be exported to new processes on the local and remote nodes. > > So, tells the man page lies, or this is an removed feature, or something else? > > > Best wishes, > >
Re: [OMPI users] How are the Open MPI processes spawned?
Hello again, Ralph Castain wrote: Yes, that would indeed break things. The 1.5 series isn't correctly checking connections across multiple interfaces until it finds one that works - it just uses the first one it sees. :-( Yahhh!! This behaviour - catch a random interface and hang forever if something is wrong with it - is somewhat less than perfect. From my perspective - the users one - OpenMPI should try to use eitcher *all* available networks (as 1.4 it does...), starting with the high performance ones, or *only* those interfaces on which the hostnames from the hostfile are bound to. It is indeed supposed to do the former - as I implied, this is a bug in the 1.5 series. Thanks for clarification. I was not sure about this is a bug or a feature :-) Also, there should be timeouts (if you cannot connect to a node within a minute you probably will never ever be connected...) We have debated about this for some time - there is a timeout mca param one can set, but we'll consider again making it default. If some connection runs into a timeout a warning would be great (and a hint to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude). Should it not? Maybe you can file it as a "call for enhancement"... Probably the right approach at this time. Ahhh.. sorry, did not understand what you mean. Did you filed it, or someone else, or should I do it in some way? Or should not? But then I ran into yet another one issue. In http://www.open-mpi.org/faq/?category=tuning#setting-mca-params the way to define MCA parameters over environment variables is described. I tried it: $ export OMPI_MCA_oob_tcp_if_include=ib0 $ export OMPI_MCA_btl_tcp_if_include=ib0 I checked it: $ ompi_info --param all all | grep oob_tcp_if_include MCA oob: parameter "oob_tcp_if_include" (current value: , data source: environment or cmdline) $ ompi_info --param all all | grep btl_tcp_if_include MCA btl: parameter "btl_tcp_if_include" (current value: , data source: environment or cmdline) But then I get again the hang-up issue! ==> seem, mpiexec does not understand these environment variables! and only get the command line options. This should not be so? No, that isn't what is happening. The problem lies in the behavior of rsh/ssh. This environment does not forward environmental variables. Because of limits on cmd line length, we don't automatically forward MCA params from the environment, but only from the cmd line. It is an annoying limitation, but one outside our control. We know about "ssh does not forward environmental variables." But in this case, are these parameters not the parameters of mpiexec itself, too? The crucial thing is, that setting of the parameters works over the command line but *does not work* over the envvar way (as in http://www.open-mpi.org/faq/?category=tuning#setting-mca-params described). This looks like a bug for me! Put those envars in the default mca param file and the problem will be resolved. You mean e.g. $prefix/etc/openmpi-mca-params.conf as described in 4. of http://www.open-mpi.org/faq/?category=tuning#setting-mca-params Well, this is possible, but not flexible enough for us (because there are some machines which only can run if the parameters are *not* set - on those the ssh goes just over these eth0 devices). By now we use the command line parameters and hope the envvar way will work sometimes. (I also tried to advise to provide the envvars by -x OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing changed. I'm surprised by that - they should be picked up and forwarded. Could be a bug Well, I also mean this is a bug, but as said not on providing the values of envvars but on detecting of these parameters at all. Or maybe on both. Well, they are OMPI_ variables and should be provided in any case). No, they aren't - they are not treated differently than any other envar. [after performing some RTFM...] at least the man page of mpiexec says, the OMPI_ environment variables are always provided and thus treated *differently* than other envvars: $ man mpiexec Exported Environment Variables All environment variables that are named in the form OMPI_* will automatically be exported to new processes on the local and remote nodes. So, tells the man page lies, or this is an removed feature, or something else? Best wishes, Paul Kapinos Specifying both include and exclude should generate an error as those are mutually exclusive options - I think this was also missed in early 1.5 releases and was recently patched. HTH Ralph On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote: On 11/23/2011 2:02 PM, Paul Kapinos wrote: Hello Ralph, hello all, Two news, as usual a good and a bad one. The good: we believe to find out *why* it hangs The bad: it seem for me, this is a bug or at least undocumented feature of Open MPI /1.5.x. In
Re: [OMPI users] How are the Open MPI processes spawned?
On Nov 24, 2011, at 11:49 AM, Paul Kapinos wrote: > Hello Ralph, Terry, all! > > again, two news: the good one and the second one. > > Ralph Castain wrote: >> Yes, that would indeed break things. The 1.5 series isn't correctly checking >> connections across multiple interfaces until it finds one that works - it >> just uses the first one it sees. :-( > > Yahhh!! > This behaviour - catch a random interface and hang forever if something is > wrong with it - is somewhat less than perfect. > > From my perspective - the users one - OpenMPI should try to use eitcher *all* > available networks (as 1.4 it does...), starting with the high performance > ones, or *only* those interfaces on which the hostnames from the hostfile are > bound to. It is indeed supposed to do the former - as I implied, this is a bug in the 1.5 series. > > Also, there should be timeouts (if you cannot connect to a node within a > minute you probably will never ever be connected...) We have debated about this for some time - there is a timeout mca param one can set, but we'll consider again making it default. > > If some connection runs into a timeout a warning would be great (and a hint > to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude). > > Should it not? > Maybe you can file it as a "call for enhancement"... Probably the right approach at this time. > > > >> The solution is to specify -mca oob_tcp_if_include ib0. This will direct the >> run-time wireup across the IP over IB interface. >> You will also need the -mca btl_tcp_if_include ib0 as well so the MPI comm >> goes exclusively over that network. > > YES! This works. Adding > -mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0 > to the command line of mpiexec helps me to run the 1.5.x programs, so I > believe this is the workaround. > > Many thanks for this hint, Ralph! My fail to not to find it in the FAQ (I was > so close :o) http://www.open-mpi.org/faq/?category=tcp#tcp-selection > > But then I ran into yet another one issue. In > http://www.open-mpi.org/faq/?category=tuning#setting-mca-params > the way to define MCA parameters over environment variables is described. > > I tried it: > $ export OMPI_MCA_oob_tcp_if_include=ib0 > $ export OMPI_MCA_btl_tcp_if_include=ib0 > > > I checked it: > $ ompi_info --param all all | grep oob_tcp_if_include > MCA oob: parameter "oob_tcp_if_include" (current value: > , data source: environment or cmdline) > $ ompi_info --param all all | grep btl_tcp_if_include > MCA btl: parameter "btl_tcp_if_include" (current value: > , data source: environment or cmdline) > > > But then I get again the hang-up issue! > > ==> seem, mpiexec does not understand these environment variables! and only > get the command line options. This should not be so? No, that isn't what is happening. The problem lies in the behavior of rsh/ssh. This environment does not forward environmental variables. Because of limits on cmd line length, we don't automatically forward MCA params from the environment, but only from the cmd line. It is an annoying limitation, but one outside our control. Put those envars in the default mca param file and the problem will be resolved. > > (I also tried to advise to provide the envvars by -x > OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing changed. I'm surprised by that - they should be picked up and forwarded. Could be a bug > Well, they are OMPI_ variables and should be provided in any case). No, they aren't - they are not treated differently than any other envar. > > > Best wishes and many thanks for all, > > Paul Kapinos > > > > >> Specifying both include and exclude should generate an error as those are >> mutually exclusive options - I think this was also missed in early 1.5 >> releases and was recently patched. >> HTH >> Ralph >> On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote: >>> On 11/23/2011 2:02 PM, Paul Kapinos wrote: Hello Ralph, hello all, Two news, as usual a good and a bad one. The good: we believe to find out *why* it hangs The bad: it seem for me, this is a bug or at least undocumented feature of Open MPI /1.5.x. In detail: As said, we see mystery hang-ups if starting on some nodes using some permutation of hostnames. Usually removing "some bad" nodes helps, sometimes a permutation of node names in the hostfile is enough(!). The behaviour is reproducible. The machines have at least 2 networks: *eth0* is used for installation, monitoring, ... - this ethernet is very slim *ib0* - is the "IP over IB" interface and is used for everything: the file systems, ssh and so on. The hostnames are bound to the ib0 network; our idea was not to use eth0 for MPI at all. all machines are available from any over ib0 (are in one network). But on eth0 there
Re: [OMPI users] How are the Open MPI processes spawned?
Hello Ralph, Terry, all! again, two news: the good one and the second one. Ralph Castain wrote: Yes, that would indeed break things. The 1.5 series isn't correctly checking connections across multiple interfaces until it finds one that works - it just uses the first one it sees. :-( Yahhh!! This behaviour - catch a random interface and hang forever if something is wrong with it - is somewhat less than perfect. From my perspective - the users one - OpenMPI should try to use eitcher *all* available networks (as 1.4 it does...), starting with the high performance ones, or *only* those interfaces on which the hostnames from the hostfile are bound to. Also, there should be timeouts (if you cannot connect to a node within a minute you probably will never ever be connected...) If some connection runs into a timeout a warning would be great (and a hint to take off the interface by oob_tcp_if_exclude, btl_tcp_if_exclude). Should it not? Maybe you can file it as a "call for enhancement"... The solution is to specify -mca oob_tcp_if_include ib0. This will direct the run-time wireup across the IP over IB interface. You will also need the -mca btl_tcp_if_include ib0 as well so the MPI comm goes exclusively over that network. YES! This works. Adding -mca oob_tcp_if_include ib0 -mca btl_tcp_if_include ib0 to the command line of mpiexec helps me to run the 1.5.x programs, so I believe this is the workaround. Many thanks for this hint, Ralph! My fail to not to find it in the FAQ (I was so close :o) http://www.open-mpi.org/faq/?category=tcp#tcp-selection But then I ran into yet another one issue. In http://www.open-mpi.org/faq/?category=tuning#setting-mca-params the way to define MCA parameters over environment variables is described. I tried it: $ export OMPI_MCA_oob_tcp_if_include=ib0 $ export OMPI_MCA_btl_tcp_if_include=ib0 I checked it: $ ompi_info --param all all | grep oob_tcp_if_include MCA oob: parameter "oob_tcp_if_include" (current value: , data source: environment or cmdline) $ ompi_info --param all all | grep btl_tcp_if_include MCA btl: parameter "btl_tcp_if_include" (current value: , data source: environment or cmdline) But then I get again the hang-up issue! ==> seem, mpiexec does not understand these environment variables! and only get the command line options. This should not be so? (I also tried to advise to provide the envvars by -x OMPI_MCA_oob_tcp_if_include -x OMPI_MCA_btl_tcp_if_include - nothing changed. Well, they are OMPI_ variables and should be provided in any case). Best wishes and many thanks for all, Paul Kapinos Specifying both include and exclude should generate an error as those are mutually exclusive options - I think this was also missed in early 1.5 releases and was recently patched. HTH Ralph On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote: On 11/23/2011 2:02 PM, Paul Kapinos wrote: Hello Ralph, hello all, Two news, as usual a good and a bad one. The good: we believe to find out *why* it hangs The bad: it seem for me, this is a bug or at least undocumented feature of Open MPI /1.5.x. In detail: As said, we see mystery hang-ups if starting on some nodes using some permutation of hostnames. Usually removing "some bad" nodes helps, sometimes a permutation of node names in the hostfile is enough(!). The behaviour is reproducible. The machines have at least 2 networks: *eth0* is used for installation, monitoring, ... - this ethernet is very slim *ib0* - is the "IP over IB" interface and is used for everything: the file systems, ssh and so on. The hostnames are bound to the ib0 network; our idea was not to use eth0 for MPI at all. all machines are available from any over ib0 (are in one network). But on eth0 there are at least two different networks; especially the computer linuxbsc025 is in different network than the others and is not reachable from other nodes over eth0! (but reachable over ib0. The name used in the hostfile is resolved to the IP of ib0 ). So I believe that Open MPI /1.5.x tries to communicate over eth0 and cannot do it, and hangs. The /1.4.3 does not hang, so this issue is 1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug? I also tried to disable the eth0 completely: $ mpiexec -mca btl_tcp_if_exclude eth0,lo -mca btl_tcp_if_include ib0 ... I believe if you give "-mca btl_tcp_if_include ib0" you do not need to specify the exclude parameter. ...but this does not help. All right, the above command should disable the usage of eth0 for MPI communication itself, but it hangs just before the MPI is started, isn't it? (because one process lacks, the MPI_INIT cannot be passed) By "just before the MPI is started" do you mean while orte is launching the processes. I wonder if you need to specify "-mca oob_tcp_if_include ib0" also but I think that may depend on which oob you are using. Now a question: is there a way to forbid the mpiexec to
Re: [OMPI users] How are the Open MPI processes spawned?
Yes, that would indeed break things. The 1.5 series isn't correctly checking connections across multiple interfaces until it finds one that works - it just uses the first one it sees. :-( The solution is to specify -mca oob_tcp_if_include ib0. This will direct the run-time wireup across the IP over IB interface. You will also need the -mca btl_tcp_if_include ib0 as well so the MPI comm goes exclusively over that network. Specifying both include and exclude should generate an error as those are mutually exclusive options - I think this was also missed in early 1.5 releases and was recently patched. HTH Ralph On Nov 23, 2011, at 12:14 PM, TERRY DONTJE wrote: > On 11/23/2011 2:02 PM, Paul Kapinos wrote: >> >> Hello Ralph, hello all, >> >> Two news, as usual a good and a bad one. >> >> The good: we believe to find out *why* it hangs >> >> The bad: it seem for me, this is a bug or at least undocumented feature of >> Open MPI /1.5.x. >> >> In detail: >> As said, we see mystery hang-ups if starting on some nodes using some >> permutation of hostnames. Usually removing "some bad" nodes helps, sometimes >> a permutation of node names in the hostfile is enough(!). The behaviour is >> reproducible. >> >> The machines have at least 2 networks: >> >> *eth0* is used for installation, monitoring, ... - this ethernet is very >> slim >> >> *ib0* - is the "IP over IB" interface and is used for everything: the file >> systems, ssh and so on. The hostnames are bound to the ib0 network; our idea >> was not to use eth0 for MPI at all. >> >> all machines are available from any over ib0 (are in one network). >> >> But on eth0 there are at least two different networks; especially the >> computer linuxbsc025 is in different network than the others and is not >> reachable from other nodes over eth0! (but reachable over ib0. The name used >> in the hostfile is resolved to the IP of ib0 ). >> >> So I believe that Open MPI /1.5.x tries to communicate over eth0 and cannot >> do it, and hangs. The /1.4.3 does not hang, so this issue is 1.5.x-specific >> (seen in 1.5.3 and 1.5.4). A bug? >> >> I also tried to disable the eth0 completely: >> >> $ mpiexec -mca btl_tcp_if_exclude eth0,lo -mca btl_tcp_if_include ib0 ... >> > I believe if you give "-mca btl_tcp_if_include ib0" you do not need to > specify the exclude parameter. >> ...but this does not help. All right, the above command should disable the >> usage of eth0 for MPI communication itself, but it hangs just before the MPI >> is started, isn't it? (because one process lacks, the MPI_INIT cannot be >> passed) >> > By "just before the MPI is started" do you mean while orte is launching the > processes. > I wonder if you need to specify "-mca oob_tcp_if_include ib0" also but I > think that may depend on which oob you are using. >> Now a question: is there a way to forbid the mpiexec to use some interfaces >> at all? >> >> Best wishes, >> >> Paul Kapinos >> >> P.S. Of course we know about the good idea to bring all nodes into the same >> net on eth0, but at this point it is impossible due of technical >> reason[s]... >> >> P.S.2 I'm not sure that the issue is really rooted in the above mentioned >> misconfiguration of eth0, but I have no better idea at this point... >> >> The map seem to be correctly build, also the output if the daemons seem to be the same (see helloworld.txt) >>> >>> Unfortunately, it appears that OMPI was not built with --enable-debug as >>> there is no debug info in the output. Without a debug installation of OMPI, >>> the ability to determine the problem is pretty limited. >> >> well, this will be the next option we will activate. We also have another >> issue here, on (not) using uDAPL.. >> >> >>> >>> > You should also try putting that long list of nodes in a hostfile - see > if that makes a difference. > It will process the nodes thru a different code path, so if there is some > problem in --host, > this will tell us. No, with the host file instead of host list on command line the behaviour is the same. But, I just found out that the 1.4.3 does *not* hang on this constellation. The next thing I will try will be the installation of 1.5.4 :o) Best, Paul P.S. started: $ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile hostfile-mini -mca odls_base_verbose 5 --leave-session-attached --display-map helloworld 2>&1 | tee helloworld.txt > On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote: >> Hello Open MPI volks, >> >> We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, >> and we have some strange hangups if starting OpenMPI processes. >> >> The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of >> offline nodes). Each node is accessible from each other
Re: [OMPI users] How are the Open MPI processes spawned?
On 11/23/2011 2:02 PM, Paul Kapinos wrote: Hello Ralph, hello all, Two news, as usual a good and a bad one. The good: we believe to find out *why* it hangs The bad: it seem for me, this is a bug or at least undocumented feature of Open MPI /1.5.x. In detail: As said, we see mystery hang-ups if starting on some nodes using some permutation of hostnames. Usually removing "some bad" nodes helps, sometimes a permutation of node names in the hostfile is enough(!). The behaviour is reproducible. The machines have at least 2 networks: *eth0* is used for installation, monitoring, ... - this ethernet is very slim *ib0* - is the "IP over IB" interface and is used for everything: the file systems, ssh and so on. The hostnames are bound to the ib0 network; our idea was not to use eth0 for MPI at all. all machines are available from any over ib0 (are in one network). But on eth0 there are at least two different networks; especially the computer linuxbsc025 is in different network than the others and is not reachable from other nodes over eth0! (but reachable over ib0. The name used in the hostfile is resolved to the IP of ib0 ). So I believe that Open MPI /1.5.x tries to communicate over eth0 and cannot do it, and hangs. The /1.4.3 does not hang, so this issue is 1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug? I also tried to disable the eth0 completely: $ mpiexec -mca btl_tcp_if_exclude eth0,lo -mca btl_tcp_if_include ib0 ... I believe if you give "-mca btl_tcp_if_include ib0" you do not need to specify the exclude parameter. ...but this does not help. All right, the above command should disable the usage of eth0 for MPI communication itself, but it hangs just before the MPI is started, isn't it? (because one process lacks, the MPI_INIT cannot be passed) By "just before the MPI is started" do you mean while orte is launching the processes. I wonder if you need to specify "-mca oob_tcp_if_include ib0" also but I think that may depend on which oob you are using. Now a question: is there a way to forbid the mpiexec to use some interfaces at all? Best wishes, Paul Kapinos P.S. Of course we know about the good idea to bring all nodes into the same net on eth0, but at this point it is impossible due of technical reason[s]... P.S.2 I'm not sure that the issue is really rooted in the above mentioned misconfiguration of eth0, but I have no better idea at this point... The map seem to be correctly build, also the output if the daemons seem to be the same (see helloworld.txt) Unfortunately, it appears that OMPI was not built with --enable-debug as there is no debug info in the output. Without a debug installation of OMPI, the ability to determine the problem is pretty limited. well, this will be the next option we will activate. We also have another issue here, on (not) using uDAPL.. You should also try putting that long list of nodes in a hostfile - see if that makes a difference. It will process the nodes thru a different code path, so if there is some problem in --host, this will tell us. No, with the host file instead of host list on command line the behaviour is the same. But, I just found out that the 1.4.3 does *not* hang on this constellation. The next thing I will try will be the installation of 1.5.4 :o) Best, Paul P.S. started: $ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile hostfile-mini -mca odls_base_verbose 5 --leave-session-attached --display-map helloworld 2>&1 | tee helloworld.txt On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote: Hello Open MPI volks, We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and we have some strange hangups if starting OpenMPI processes. The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of offline nodes). Each node is accessible from each other over SSH (without password), also MPI programs between any two nodes are checked to run. So long, I tried to start some bigger number of processes, one process per node: $ mpiexec -np NN --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe Now the problem: there are some constellations of names in the host list on which mpiexec reproducible hangs forever; and more surprising: other *permutation* of the *same* node names may run without any errors! Example: the command in laueft.txt runs OK, the command in haengt.txt hangs. Note: the only difference is that the node linuxbsc025 is put on the end of the host list. Amazed, too? Looking on the particular nodes during the above mpiexec hangs, we found the orted daemons started on *each* node and the binary on all but one node (orted.txt, MPI_FastTest.txt). Again amazing that the node with no user process started (leading to hangup in MPI_Init of all processes and thus to hangup, I believe) was always the same, linuxbsc005, which is NOT the permuted item linuxbsc025... This behaviour is reproducible. The hang-on only occure
Re: [OMPI users] How are the Open MPI processes spawned?
Hello Ralph, hello all, Two news, as usual a good and a bad one. The good: we believe to find out *why* it hangs The bad: it seem for me, this is a bug or at least undocumented feature of Open MPI /1.5.x. In detail: As said, we see mystery hang-ups if starting on some nodes using some permutation of hostnames. Usually removing "some bad" nodes helps, sometimes a permutation of node names in the hostfile is enough(!). The behaviour is reproducible. The machines have at least 2 networks: *eth0* is used for installation, monitoring, ... - this ethernet is very slim *ib0* - is the "IP over IB" interface and is used for everything: the file systems, ssh and so on. The hostnames are bound to the ib0 network; our idea was not to use eth0 for MPI at all. all machines are available from any over ib0 (are in one network). But on eth0 there are at least two different networks; especially the computer linuxbsc025 is in different network than the others and is not reachable from other nodes over eth0! (but reachable over ib0. The name used in the hostfile is resolved to the IP of ib0 ). So I believe that Open MPI /1.5.x tries to communicate over eth0 and cannot do it, and hangs. The /1.4.3 does not hang, so this issue is 1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug? I also tried to disable the eth0 completely: $ mpiexec -mca btl_tcp_if_exclude eth0,lo -mca btl_tcp_if_include ib0 ... ...but this does not help. All right, the above command should disable the usage of eth0 for MPI communication itself, but it hangs just before the MPI is started, isn't it? (because one process lacks, the MPI_INIT cannot be passed) Now a question: is there a way to forbid the mpiexec to use some interfaces at all? Best wishes, Paul Kapinos P.S. Of course we know about the good idea to bring all nodes into the same net on eth0, but at this point it is impossible due of technical reason[s]... P.S.2 I'm not sure that the issue is really rooted in the above mentioned misconfiguration of eth0, but I have no better idea at this point... The map seem to be correctly build, also the output if the daemons seem to be the same (see helloworld.txt) Unfortunately, it appears that OMPI was not built with --enable-debug as there is no debug info in the output. Without a debug installation of OMPI, the ability to determine the problem is pretty limited. well, this will be the next option we will activate. We also have another issue here, on (not) using uDAPL.. You should also try putting that long list of nodes in a hostfile - see if that makes a difference. It will process the nodes thru a different code path, so if there is some problem in --host, this will tell us. No, with the host file instead of host list on command line the behaviour is the same. But, I just found out that the 1.4.3 does *not* hang on this constellation. The next thing I will try will be the installation of 1.5.4 :o) Best, Paul P.S. started: $ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile hostfile-mini -mca odls_base_verbose 5 --leave-session-attached --display-map helloworld 2>&1 | tee helloworld.txt On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote: Hello Open MPI volks, We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and we have some strange hangups if starting OpenMPI processes. The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of offline nodes). Each node is accessible from each other over SSH (without password), also MPI programs between any two nodes are checked to run. So long, I tried to start some bigger number of processes, one process per node: $ mpiexec -np NN --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe Now the problem: there are some constellations of names in the host list on which mpiexec reproducible hangs forever; and more surprising: other *permutation* of the *same* node names may run without any errors! Example: the command in laueft.txt runs OK, the command in haengt.txt hangs. Note: the only difference is that the node linuxbsc025 is put on the end of the host list. Amazed, too? Looking on the particular nodes during the above mpiexec hangs, we found the orted daemons started on *each* node and the binary on all but one node (orted.txt, MPI_FastTest.txt). Again amazing that the node with no user process started (leading to hangup in MPI_Init of all processes and thus to hangup, I believe) was always the same, linuxbsc005, which is NOT the permuted item linuxbsc025... This behaviour is reproducible. The hang-on only occure if the started application is a MPI application ("hostname" does not hang). Any Idea what is gonna on? Best, Paul Kapinos P.S: no alias names used, all names are real ones -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49
Re: [OMPI users] How are the Open MPI processes spawned?
On Nov 22, 2011, at 10:10 AM, Paul Kapinos wrote: > Hello Ralph, hello all. > >> No real ideas, I'm afraid. We regularly launch much larger jobs than that >> using ssh without problem, > I was also able to run a 288-node-job yesterday - the size alone is not the > problem... > > > >> so it is likely something about the local setup of that node that is causing >> the problem. Offhand, it sounds like either the mapper isn't getting things >> right, or for some reason the daemon on 005 isn't properly getting or >> processing the launch command. >> What you could try is adding --display-map to see if the map is being >> correctly generated. > > If that works, then (using a debug build) try adding > > --leave-session-attached and see if > > any daemons are outputting an error. >> You could add -mca odls_base_verbose 5 --leave-session-attached to your cmd >> line. > > You'll see debug output from each daemon as it receives and processes >> the launch command. See if the daemon on 005 is behaving differently than >> the others. > > I've tried the options. > The map seem to be correctly build, also the output if the daemons seem to be > the same (see helloworld.txt) Unfortunately, it appears that OMPI was not built with --enable-debug as there is no debug info in the output. Without a debug installation of OMPI, the ability to determine the problem is pretty limited. > >> You should also try putting that long list of nodes in a hostfile - see if >> that makes a difference. > > It will process the nodes thru a different code path, so if there is some > > problem in --host, >> this will tell us. > > No, with the host file instead of host list on command line the behaviour is > the same. > > But, I just found out that the 1.4.3 does *not* hang on this constellation. > The next thing I will try will be the installation of 1.5.4 :o) > > Best, > > Paul > > P.S. started: > > $ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile hostfile-mini > -mca odls_base_verbose 5 --leave-session-attached --display-map helloworld > 2>&1 | tee helloworld.txt > > > >> On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote: >>> Hello Open MPI volks, >>> >>> We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and >>> we have some strange hangups if starting OpenMPI processes. >>> >>> The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of >>> offline nodes). Each node is accessible from each other over SSH (without >>> password), also MPI programs between any two nodes are checked to run. >>> >>> >>> So long, I tried to start some bigger number of processes, one process per >>> node: >>> $ mpiexec -np NN --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe >>> >>> Now the problem: there are some constellations of names in the host list on >>> which mpiexec reproducible hangs forever; and more surprising: other >>> *permutation* of the *same* node names may run without any errors! >>> >>> Example: the command in laueft.txt runs OK, the command in haengt.txt >>> hangs. Note: the only difference is that the node linuxbsc025 is put on the >>> end of the host list. Amazed, too? >>> >>> Looking on the particular nodes during the above mpiexec hangs, we found >>> the orted daemons started on *each* node and the binary on all but one node >>> (orted.txt, MPI_FastTest.txt). >>> Again amazing that the node with no user process started (leading to hangup >>> in MPI_Init of all processes and thus to hangup, I believe) was always the >>> same, linuxbsc005, which is NOT the permuted item linuxbsc025... >>> >>> This behaviour is reproducible. The hang-on only occure if the started >>> application is a MPI application ("hostname" does not hang). >>> >>> >>> Any Idea what is gonna on? >>> >>> >>> Best, >>> >>> Paul Kapinos >>> >>> >>> P.S: no alias names used, all names are real ones >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Dipl.-Inform. Paul Kapinos - High Performance Computing, >>> RWTH Aachen University, Center for Computing and Communication >>> Seffenter Weg 23, D 52074 Aachen (Germany) >>> Tel: +49 241/80-24915 >>> linuxbsc001: STDOUT: 24323 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc002: STDOUT: 2142 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc003: STDOUT: 69266 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc004: STDOUT: 58899 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc006: STDOUT: 68255 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc007: STDOUT: 62026 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc008: STDOUT: 54221 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc009: STDOUT: 55482 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc010: STDOUT: 59380 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc011: STDOUT: 58312 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc014: STDOUT: 56013 ?SLl0:00 MPI_FastTest.exe >>> linuxbsc016: STDOUT: 58563 ?SLl0:00 MPI_FastTest.exe >>>
Re: [OMPI users] How are the Open MPI processes spawned?
Hello Ralph, hello all. No real ideas, I'm afraid. We regularly launch much larger jobs than that using ssh without problem, I was also able to run a 288-node-job yesterday - the size alone is not the problem... so it is likely something about the local setup of that node that is causing the problem. Offhand, it sounds like either the mapper isn't getting things right, or for some reason the daemon on 005 isn't properly getting or processing the launch command. What you could try is adding --display-map to see if the map is being correctly generated. > If that works, then (using a debug build) try adding --leave-session-attached and see if > any daemons are outputting an error. You could add -mca odls_base_verbose 5 --leave-session-attached to your cmd line. > You'll see debug output from each daemon as it receives and processes the launch command. See if the daemon on 005 is behaving differently than the others. I've tried the options. The map seem to be correctly build, also the output if the daemons seem to be the same (see helloworld.txt) You should also try putting that long list of nodes in a hostfile - see if that makes a difference. > It will process the nodes thru a different code path, so if there is some problem in --host, this will tell us. No, with the host file instead of host list on command line the behaviour is the same. But, I just found out that the 1.4.3 does *not* hang on this constellation. The next thing I will try will be the installation of 1.5.4 :o) Best, Paul P.S. started: $ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile hostfile-mini -mca odls_base_verbose 5 --leave-session-attached --display-map helloworld 2>&1 | tee helloworld.txt On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote: Hello Open MPI volks, We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and we have some strange hangups if starting OpenMPI processes. The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of offline nodes). Each node is accessible from each other over SSH (without password), also MPI programs between any two nodes are checked to run. So long, I tried to start some bigger number of processes, one process per node: $ mpiexec -np NN --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe Now the problem: there are some constellations of names in the host list on which mpiexec reproducible hangs forever; and more surprising: other *permutation* of the *same* node names may run without any errors! Example: the command in laueft.txt runs OK, the command in haengt.txt hangs. Note: the only difference is that the node linuxbsc025 is put on the end of the host list. Amazed, too? Looking on the particular nodes during the above mpiexec hangs, we found the orted daemons started on *each* node and the binary on all but one node (orted.txt, MPI_FastTest.txt). Again amazing that the node with no user process started (leading to hangup in MPI_Init of all processes and thus to hangup, I believe) was always the same, linuxbsc005, which is NOT the permuted item linuxbsc025... This behaviour is reproducible. The hang-on only occure if the started application is a MPI application ("hostname" does not hang). Any Idea what is gonna on? Best, Paul Kapinos P.S: no alias names used, all names are real ones -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 linuxbsc001: STDOUT: 24323 ?SLl0:00 MPI_FastTest.exe linuxbsc002: STDOUT: 2142 ?SLl0:00 MPI_FastTest.exe linuxbsc003: STDOUT: 69266 ?SLl0:00 MPI_FastTest.exe linuxbsc004: STDOUT: 58899 ?SLl0:00 MPI_FastTest.exe linuxbsc006: STDOUT: 68255 ?SLl0:00 MPI_FastTest.exe linuxbsc007: STDOUT: 62026 ?SLl0:00 MPI_FastTest.exe linuxbsc008: STDOUT: 54221 ?SLl0:00 MPI_FastTest.exe linuxbsc009: STDOUT: 55482 ?SLl0:00 MPI_FastTest.exe linuxbsc010: STDOUT: 59380 ?SLl0:00 MPI_FastTest.exe linuxbsc011: STDOUT: 58312 ?SLl0:00 MPI_FastTest.exe linuxbsc014: STDOUT: 56013 ?SLl0:00 MPI_FastTest.exe linuxbsc016: STDOUT: 58563 ?SLl0:00 MPI_FastTest.exe linuxbsc017: STDOUT: 54693 ?SLl0:00 MPI_FastTest.exe linuxbsc018: STDOUT: 54187 ?SLl0:00 MPI_FastTest.exe linuxbsc020: STDOUT: 55811 ?SLl0:00 MPI_FastTest.exe linuxbsc021: STDOUT: 54982 ?SLl0:00 MPI_FastTest.exe linuxbsc022: STDOUT: 50032 ?SLl0:00 MPI_FastTest.exe linuxbsc023: STDOUT: 54044 ?SLl0:00 MPI_FastTest.exe linuxbsc024: STDOUT: 51247 ?SLl0:00 MPI_FastTest.exe linuxbsc025: STDOUT: 18575 ?SLl0:00 MPI_FastTest.exe linuxbsc027: STDOUT: 48969 ?SLl0:00 MPI_FastTest.exe linuxbsc028: STDOUT: 52397 ?SLl0:00
Re: [OMPI users] How are the Open MPI processes spawned?
No real ideas, I'm afraid. We regularly launch much larger jobs than that using ssh without problem, so it is likely something about the local setup of that node that is causing the problem. Offhand, it sounds like either the mapper isn't getting things right, or for some reason the daemon on 005 isn't properly getting or processing the launch command. What you could try is adding --display-map to see if the map is being correctly generated. If that works, then (using a debug build) try adding --leave-session-attached and see if any daemons are outputting an error. You could add -mca odls_base_verbose 5 --leave-session-attached to your cmd line. You'll see debug output from each daemon as it receives and processes the launch command. See if the daemon on 005 is behaving differently than the others. You should also try putting that long list of nodes in a hostfile - see if that makes a difference. It will process the nodes thru a different code path, so if there is some problem in --host, this will tell us. On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote: > Hello Open MPI volks, > > We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and we > have some strange hangups if starting OpenMPI processes. > > The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of > offline nodes). Each node is accessible from each other over SSH (without > password), also MPI programs between any two nodes are checked to run. > > > So long, I tried to start some bigger number of processes, one process per > node: > $ mpiexec -np NN --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe > > Now the problem: there are some constellations of names in the host list on > which mpiexec reproducible hangs forever; and more surprising: other > *permutation* of the *same* node names may run without any errors! > > Example: the command in laueft.txt runs OK, the command in haengt.txt hangs. > Note: the only difference is that the node linuxbsc025 is put on the end of > the host list. Amazed, too? > > Looking on the particular nodes during the above mpiexec hangs, we found the > orted daemons started on *each* node and the binary on all but one node > (orted.txt, MPI_FastTest.txt). > Again amazing that the node with no user process started (leading to hangup > in MPI_Init of all processes and thus to hangup, I believe) was always the > same, linuxbsc005, which is NOT the permuted item linuxbsc025... > > This behaviour is reproducible. The hang-on only occure if the started > application is a MPI application ("hostname" does not hang). > > > Any Idea what is gonna on? > > > Best, > > Paul Kapinos > > > P.S: no alias names used, all names are real ones > > > > > > > > -- > Dipl.-Inform. Paul Kapinos - High Performance Computing, > RWTH Aachen University, Center for Computing and Communication > Seffenter Weg 23, D 52074 Aachen (Germany) > Tel: +49 241/80-24915 > linuxbsc001: STDOUT: 24323 ?SLl0:00 MPI_FastTest.exe > linuxbsc002: STDOUT: 2142 ?SLl0:00 MPI_FastTest.exe > linuxbsc003: STDOUT: 69266 ?SLl0:00 MPI_FastTest.exe > linuxbsc004: STDOUT: 58899 ?SLl0:00 MPI_FastTest.exe > linuxbsc006: STDOUT: 68255 ?SLl0:00 MPI_FastTest.exe > linuxbsc007: STDOUT: 62026 ?SLl0:00 MPI_FastTest.exe > linuxbsc008: STDOUT: 54221 ?SLl0:00 MPI_FastTest.exe > linuxbsc009: STDOUT: 55482 ?SLl0:00 MPI_FastTest.exe > linuxbsc010: STDOUT: 59380 ?SLl0:00 MPI_FastTest.exe > linuxbsc011: STDOUT: 58312 ?SLl0:00 MPI_FastTest.exe > linuxbsc014: STDOUT: 56013 ?SLl0:00 MPI_FastTest.exe > linuxbsc016: STDOUT: 58563 ?SLl0:00 MPI_FastTest.exe > linuxbsc017: STDOUT: 54693 ?SLl0:00 MPI_FastTest.exe > linuxbsc018: STDOUT: 54187 ?SLl0:00 MPI_FastTest.exe > linuxbsc020: STDOUT: 55811 ?SLl0:00 MPI_FastTest.exe > linuxbsc021: STDOUT: 54982 ?SLl0:00 MPI_FastTest.exe > linuxbsc022: STDOUT: 50032 ?SLl0:00 MPI_FastTest.exe > linuxbsc023: STDOUT: 54044 ?SLl0:00 MPI_FastTest.exe > linuxbsc024: STDOUT: 51247 ?SLl0:00 MPI_FastTest.exe > linuxbsc025: STDOUT: 18575 ?SLl0:00 MPI_FastTest.exe > linuxbsc027: STDOUT: 48969 ?SLl0:00 MPI_FastTest.exe > linuxbsc028: STDOUT: 52397 ?SLl0:00 MPI_FastTest.exe > linuxbsc029: STDOUT: 52780 ?SLl0:00 MPI_FastTest.exe > linuxbsc030: STDOUT: 47537 ?SLl0:00 MPI_FastTest.exe > linuxbsc031: STDOUT: 54609 ?SLl0:00 MPI_FastTest.exe > linuxbsc032: STDOUT: 52833 ?SLl0:00 MPI_FastTest.exe > $ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27 --host >
[OMPI users] How are the Open MPI processes spawned?
Hello Open MPI volks, We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand cluster, and we have some strange hangups if starting OpenMPI processes. The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna due of offline nodes). Each node is accessible from each other over SSH (without password), also MPI programs between any two nodes are checked to run. So long, I tried to start some bigger number of processes, one process per node: $ mpiexec -np NN --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe Now the problem: there are some constellations of names in the host list on which mpiexec reproducible hangs forever; and more surprising: other *permutation* of the *same* node names may run without any errors! Example: the command in laueft.txt runs OK, the command in haengt.txt hangs. Note: the only difference is that the node linuxbsc025 is put on the end of the host list. Amazed, too? Looking on the particular nodes during the above mpiexec hangs, we found the orted daemons started on *each* node and the binary on all but one node (orted.txt, MPI_FastTest.txt). Again amazing that the node with no user process started (leading to hangup in MPI_Init of all processes and thus to hangup, I believe) was always the same, linuxbsc005, which is NOT the permuted item linuxbsc025... This behaviour is reproducible. The hang-on only occure if the started application is a MPI application ("hostname" does not hang). Any Idea what is gonna on? Best, Paul Kapinos P.S: no alias names used, all names are real ones -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 linuxbsc001: STDOUT: 24323 ?SLl0:00 MPI_FastTest.exe linuxbsc002: STDOUT: 2142 ?SLl0:00 MPI_FastTest.exe linuxbsc003: STDOUT: 69266 ?SLl0:00 MPI_FastTest.exe linuxbsc004: STDOUT: 58899 ?SLl0:00 MPI_FastTest.exe linuxbsc006: STDOUT: 68255 ?SLl0:00 MPI_FastTest.exe linuxbsc007: STDOUT: 62026 ?SLl0:00 MPI_FastTest.exe linuxbsc008: STDOUT: 54221 ?SLl0:00 MPI_FastTest.exe linuxbsc009: STDOUT: 55482 ?SLl0:00 MPI_FastTest.exe linuxbsc010: STDOUT: 59380 ?SLl0:00 MPI_FastTest.exe linuxbsc011: STDOUT: 58312 ?SLl0:00 MPI_FastTest.exe linuxbsc014: STDOUT: 56013 ?SLl0:00 MPI_FastTest.exe linuxbsc016: STDOUT: 58563 ?SLl0:00 MPI_FastTest.exe linuxbsc017: STDOUT: 54693 ?SLl0:00 MPI_FastTest.exe linuxbsc018: STDOUT: 54187 ?SLl0:00 MPI_FastTest.exe linuxbsc020: STDOUT: 55811 ?SLl0:00 MPI_FastTest.exe linuxbsc021: STDOUT: 54982 ?SLl0:00 MPI_FastTest.exe linuxbsc022: STDOUT: 50032 ?SLl0:00 MPI_FastTest.exe linuxbsc023: STDOUT: 54044 ?SLl0:00 MPI_FastTest.exe linuxbsc024: STDOUT: 51247 ?SLl0:00 MPI_FastTest.exe linuxbsc025: STDOUT: 18575 ?SLl0:00 MPI_FastTest.exe linuxbsc027: STDOUT: 48969 ?SLl0:00 MPI_FastTest.exe linuxbsc028: STDOUT: 52397 ?SLl0:00 MPI_FastTest.exe linuxbsc029: STDOUT: 52780 ?SLl0:00 MPI_FastTest.exe linuxbsc030: STDOUT: 47537 ?SLl0:00 MPI_FastTest.exe linuxbsc031: STDOUT: 54609 ?SLl0:00 MPI_FastTest.exe linuxbsc032: STDOUT: 52833 ?SLl0:00 MPI_FastTest.exe $ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27 --host linuxbsc001,linuxbsc002,linuxbsc003,linuxbsc004,linuxbsc005,linuxbsc006,linuxbsc007,linuxbsc008,linuxbsc009,linuxbsc010,linuxbsc011,linuxbsc014,linuxbsc016,linuxbsc017,linuxbsc018,linuxbsc020,linuxbsc021,linuxbsc022,linuxbsc023,linuxbsc024,linuxbsc025,linuxbsc027,linuxbsc028,linuxbsc029,linuxbsc030,linuxbsc031,linuxbsc032 MPI_FastTest.exe $ timex /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec -np 27 --host linuxbsc001,linuxbsc002,linuxbsc003,linuxbsc004,linuxbsc005,linuxbsc006,linuxbsc007,linuxbsc008,linuxbsc009,linuxbsc010,linuxbsc011,linuxbsc014,linuxbsc016,linuxbsc017,linuxbsc018,linuxbsc020,linuxbsc021,linuxbsc022,linuxbsc023,linuxbsc024,linuxbsc027,linuxbsc028,linuxbsc029,linuxbsc030,linuxbsc031,linuxbsc032,linuxbsc025 MPI_FastTest.exe linuxbsc001: STDOUT: 24322 ?Ss 0:00 /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh linuxbsc002: STDOUT: 2141 ?Ss 0:00 /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 28 --hnp-uri 751435776.0;tcp://134.61.194.2:33210 -mca plm rsh linuxbsc003: STDOUT: 69265 ?Ss 0:00 /opt/MPI/openmpi-1.5.3/linux/intel/bin/orted --daemonize -mca ess env -mca orte_ess_jobid 751435776 -mca orte_ess_vpid 3 -mca