Re: [OMPI users] long initialization
In OMPI 1.9a1r32604 I get much better results: $ time mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI semenov@compiler-2 Distribution, ident: 1.9a1r32604, repo rev: r32604, Aug 26, 2014 (nightly snapshot tarball), 146) real 0m4.166s user 0m0.034s sys 0m0.079s Thu, 28 Aug 2014 13:10:02 +0400 от Timur Ismagilov: >I enclosure 2 files with output of two foloowing commands (OMPI 1.9a1r32570) >$time mpirun --leave-session-attached -mca oob_base_verbose 100 -np 1 >./hello_c >& out1.txt >(Hello, world, I am ) >real 1m3.952s >user 0m0.035s >sys 0m0.107s >$time mpirun --leave-session-attached -mca oob_base_verbose 100 --mca >oob_tcp_if_include ib0 -np 1 ./hello_c >& out2.txt >(no Hello, word, I am ) >real 0m9.337s >user 0m0.059s >sys 0m0.098s >Wed, 27 Aug 2014 06:31:02 -0700 от Ralph Castain : >>How bizarre. Please add "--leave-session-attached -mca oob_base_verbose 100" >>to your cmd line >> >>On Aug 27, 2014, at 4:31 AM, Timur Ismagilov < tismagi...@mail.ru > wrote: >>>When i try to specify oob with --mca oob_tcp_if_include >>from ifconfig>, i alwase get error: >>>$ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c >>>-- >>>An ORTE daemon has unexpectedly failed after launch and before >>>communicating back to mpirun. This could be caused by a number >>>of factors, including an inability to create a connection back >>>to mpirun due to a lack of common network interfaces and/or no >>>route found between them. Please check network connectivity >>>(including firewalls and network routing requirements). >>>- >>> >>>Earlier, in ompi 1.8.1, I can not run mpi jobs without " --mca >>>oob_tcp_if_include ib0 "... but now(ompi 1.9.a1) with this flag i get above >>>error. >>> >>>Here is an output of ifconfig >>>$ ifconfig >>>eth1 Link encap:Ethernet HWaddr 00:15:17:EE:89:E1 >>>inet addr:10.0.251.53 Bcast:10.0.251.255 Mask:255.255.255.0 >>>UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>RX packets:215087433 errors:0 dropped:0 overruns:0 frame:0 >>>TX packets:2648 errors:0 dropped:0 overruns:0 carrier:0 >>>collisions:0 txqueuelen:1000 >>>RX bytes:26925754883 (25.0 GiB) TX bytes:137971 (134.7 KiB) >>>Memory:b2c0-b2c2 >>>eth2 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 >>>inet addr:10.0.0.4 Bcast:10.0.0.255 Mask:255.255.255.0 >>>UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>RX packets:4892833125 errors:0 dropped:0 overruns:0 frame:0 >>>TX packets:8708606918 errors:0 dropped:0 overruns:0 carrier:0 >>>collisions:0 txqueuelen:1000 >>>RX bytes:1823986502132 (1.6 TiB) TX bytes:11957754120037 (10.8 TiB) >>>eth2.911 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 >>>inet addr:93.180.7.38 Bcast:93.180.7.63 Mask:255.255.255.224 >>>UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>RX packets:3746454225 errors:0 dropped:0 overruns:0 frame:0 >>>TX packets:1131917608 errors:0 dropped:3 overruns:0 carrier:0 >>>collisions:0 txqueuelen:0 >>>RX bytes:285174723322 (265.5 GiB) TX bytes:11523163526058 (10.4 TiB) >>>eth3 Link encap:Ethernet HWaddr 00:02:C9:04:73:F9 >>>inet addr:10.2.251.14 Bcast:10.2.251.255 Mask:255.255.255.0 >>>UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>RX packets:591156692 errors:0 dropped:56 overruns:56 frame:56 >>>TX packets:679729229 errors:0 dropped:0 overruns:0 carrier:0 >>>collisions:0 txqueuelen:1000 >>>RX bytes:324195989293 (301.9 GiB) TX bytes:770299202886 (717.3 GiB) >>>Ifconfig uses the ioctl access method to get the full address information, >>>which limits hardware addresses to 8 bytes. >>>Because Infiniband address has 20 bytes, only the first 8 bytes are >>>displayed correctly. >>>Ifconfig is obsolete! For replacement check ip. >>>ib0 Link encap:InfiniBand HWaddr >>>80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 >>>inet addr:10.128.0.4 Bcast:10.128.255.255 Mask:255.255.0.0 >>>UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 >>>RX packets:10843859 errors:0 dropped:0 overruns:0 frame:0 >>>TX packets:8089839 errors:0 dropped:15 overruns:0 carrier:0 >>>collisions:0 txqueuelen:1024 >>>RX bytes:939249464 (895.7 MiB) TX bytes:886054008 (845.0 MiB) >>>lo Link encap:Local Loopback >>>inet addr:127.0.0.1 Mask:255.0.0.0 >>>UP LOOPBACK RUNNING MTU:16436 Metric:1 >>>RX packets:31235107 errors:0 dropped:0 overruns:0 frame:0 >>>TX packets:31235107 errors:0 dropped:0 overruns:0 carrier:0 >>>collisions:0 txqueuelen:0 >>>RX bytes:132750916041 (123.6 GiB) TX bytes:132750916041 (123.6 GiB) >>> >>> >>> >>>Tue, 26 Aug 2014 09:48:35 -0700 от Ralph Castain < r...@open-mpi.org >: I think something may be messed up with your installation. I went ahead and tested this on a Slurm 2.5.4 cluster, and got the following: $ time mpirun -np 1 --host bend001 ./hello Hello, World, I am 0 of 1
Re: [OMPI users] long initialization
I enclosure 2 files with output of two foloowing commands (OMPI 1.9a1r32570) $time mpirun --leave-session-attached -mca oob_base_verbose 100 -np 1 ./hello_c >& out1.txt (Hello, world, I am ) real 1m3.952s user 0m0.035s sys 0m0.107s $time mpirun --leave-session-attached -mca oob_base_verbose 100 --mca oob_tcp_if_include ib0 -np 1 ./hello_c >& out2.txt (no Hello, word, I am ) real 0m9.337s user 0m0.059s sys 0m0.098s Wed, 27 Aug 2014 06:31:02 -0700 от Ralph Castain: >How bizarre. Please add "--leave-session-attached -mca oob_base_verbose 100" >to your cmd line > >On Aug 27, 2014, at 4:31 AM, Timur Ismagilov < tismagi...@mail.ru > wrote: >>When i try to specify oob with --mca oob_tcp_if_include >from ifconfig>, i alwase get error: >>$ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c >>-- >>An ORTE daemon has unexpectedly failed after launch and before >>communicating back to mpirun. This could be caused by a number >>of factors, including an inability to create a connection back >>to mpirun due to a lack of common network interfaces and/or no >>route found between them. Please check network connectivity >>(including firewalls and network routing requirements). >>- >> >>Earlier, in ompi 1.8.1, I can not run mpi jobs without " --mca >>oob_tcp_if_include ib0 "... but now(ompi 1.9.a1) with this flag i get above >>error. >> >>Here is an output of ifconfig >>$ ifconfig >>eth1 Link encap:Ethernet HWaddr 00:15:17:EE:89:E1 >>inet addr:10.0.251.53 Bcast:10.0.251.255 Mask:255.255.255.0 >>UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>RX packets:215087433 errors:0 dropped:0 overruns:0 frame:0 >>TX packets:2648 errors:0 dropped:0 overruns:0 carrier:0 >>collisions:0 txqueuelen:1000 >>RX bytes:26925754883 (25.0 GiB) TX bytes:137971 (134.7 KiB) >>Memory:b2c0-b2c2 >>eth2 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 >>inet addr:10.0.0.4 Bcast:10.0.0.255 Mask:255.255.255.0 >>UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>RX packets:4892833125 errors:0 dropped:0 overruns:0 frame:0 >>TX packets:8708606918 errors:0 dropped:0 overruns:0 carrier:0 >>collisions:0 txqueuelen:1000 >>RX bytes:1823986502132 (1.6 TiB) TX bytes:11957754120037 (10.8 TiB) >>eth2.911 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 >>inet addr:93.180.7.38 Bcast:93.180.7.63 Mask:255.255.255.224 >>UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>RX packets:3746454225 errors:0 dropped:0 overruns:0 frame:0 >>TX packets:1131917608 errors:0 dropped:3 overruns:0 carrier:0 >>collisions:0 txqueuelen:0 >>RX bytes:285174723322 (265.5 GiB) TX bytes:11523163526058 (10.4 TiB) >>eth3 Link encap:Ethernet HWaddr 00:02:C9:04:73:F9 >>inet addr:10.2.251.14 Bcast:10.2.251.255 Mask:255.255.255.0 >>UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>RX packets:591156692 errors:0 dropped:56 overruns:56 frame:56 >>TX packets:679729229 errors:0 dropped:0 overruns:0 carrier:0 >>collisions:0 txqueuelen:1000 >>RX bytes:324195989293 (301.9 GiB) TX bytes:770299202886 (717.3 GiB) >>Ifconfig uses the ioctl access method to get the full address information, >>which limits hardware addresses to 8 bytes. >>Because Infiniband address has 20 bytes, only the first 8 bytes are displayed >>correctly. >>Ifconfig is obsolete! For replacement check ip. >>ib0 Link encap:InfiniBand HWaddr >>80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 >>inet addr:10.128.0.4 Bcast:10.128.255.255 Mask:255.255.0.0 >>UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 >>RX packets:10843859 errors:0 dropped:0 overruns:0 frame:0 >>TX packets:8089839 errors:0 dropped:15 overruns:0 carrier:0 >>collisions:0 txqueuelen:1024 >>RX bytes:939249464 (895.7 MiB) TX bytes:886054008 (845.0 MiB) >>lo Link encap:Local Loopback >>inet addr:127.0.0.1 Mask:255.0.0.0 >>UP LOOPBACK RUNNING MTU:16436 Metric:1 >>RX packets:31235107 errors:0 dropped:0 overruns:0 frame:0 >>TX packets:31235107 errors:0 dropped:0 overruns:0 carrier:0 >>collisions:0 txqueuelen:0 >>RX bytes:132750916041 (123.6 GiB) TX bytes:132750916041 (123.6 GiB) >> >> >> >>Tue, 26 Aug 2014 09:48:35 -0700 от Ralph Castain < r...@open-mpi.org >: >>>I think something may be messed up with your installation. I went ahead and >>>tested this on a Slurm 2.5.4 cluster, and got the following: >>> >>>$ time mpirun -np 1 --host bend001 ./hello >>>Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12 >>> >>>real 0m0.086s >>>user 0m0.039s >>>sys 0m0.046s >>> >>>$ time mpirun -np 1 --host bend002 ./hello >>>Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12 >>> >>>real 0m0.528s >>>user 0m0.021s >>>sys 0m0.023s >>> >>>Which is what I would have expected. With --host set to the local host, no >>>daemons are being launched and so the time is quite short (just spent >>>mapping and fork/exec). With --host set to a single remote host,
Re: [OMPI users] long initialization
How bizarre. Please add "--leave-session-attached -mca oob_base_verbose 100" to your cmd line On Aug 27, 2014, at 4:31 AM, Timur Ismagilovwrote: > When i try to specify oob with --mca oob_tcp_if_include from ifconfig>, i alwase get error: > > $ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c > -- > An ORTE daemon has unexpectedly failed after launch and before > communicating back to mpirun. This could be caused by a number > of factors, including an inability to create a connection back > to mpirun due to a lack of common network interfaces and/or no > route found between them. Please check network connectivity > (including firewalls and network routing requirements). > - > > Earlier, in ompi 1.8.1, I can not run mpi jobs without " --mca > oob_tcp_if_include ib0 "... but now(ompi 1.9.a1) with this flag i get above > error. > > Here is an output of ifconfig > > $ ifconfig > eth1 Link encap:Ethernet HWaddr 00:15:17:EE:89:E1 > inet addr:10.0.251.53 Bcast:10.0.251.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:215087433 errors:0 dropped:0 overruns:0 frame:0 > TX packets:2648 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:26925754883 (25.0 GiB) TX bytes:137971 (134.7 KiB) > Memory:b2c0-b2c2 > > eth2 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 > inet addr:10.0.0.4 Bcast:10.0.0.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:4892833125 errors:0 dropped:0 overruns:0 frame:0 > TX packets:8708606918 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:1823986502132 (1.6 TiB) TX bytes:11957754120037 (10.8 TiB) > > eth2.911 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 > inet addr:93.180.7.38 Bcast:93.180.7.63 Mask:255.255.255.224 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:3746454225 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1131917608 errors:0 dropped:3 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:285174723322 (265.5 GiB) TX bytes:11523163526058 (10.4 TiB) > > eth3 Link encap:Ethernet HWaddr 00:02:C9:04:73:F9 > inet addr:10.2.251.14 Bcast:10.2.251.255 Mask:255.255.255.0 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:591156692 errors:0 dropped:56 overruns:56 frame:56 > TX packets:679729229 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:324195989293 (301.9 GiB) TX bytes:770299202886 (717.3 GiB) > > Ifconfig uses the ioctl access method to get the full address information, > which limits hardware addresses to 8 bytes. > Because Infiniband address has 20 bytes, only the first 8 bytes are displayed > correctly. > Ifconfig is obsolete! For replacement check ip. > ib0 Link encap:InfiniBand HWaddr > 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 > inet addr:10.128.0.4 Bcast:10.128.255.255 Mask:255.255.0.0 > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:10843859 errors:0 dropped:0 overruns:0 frame:0 > TX packets:8089839 errors:0 dropped:15 overruns:0 carrier:0 > collisions:0 txqueuelen:1024 > RX bytes:939249464 (895.7 MiB) TX bytes:886054008 (845.0 MiB) > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:31235107 errors:0 dropped:0 overruns:0 frame:0 > TX packets:31235107 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:132750916041 (123.6 GiB) TX bytes:132750916041 (123.6 GiB) > > > > > Tue, 26 Aug 2014 09:48:35 -0700 от Ralph Castain : > > I think something may be messed up with your installation. I went ahead and > tested this on a Slurm 2.5.4 cluster, and got the following: > > $ time mpirun -np 1 --host bend001 ./hello > Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12 > > real 0m0.086s > user 0m0.039s > sys 0m0.046s > > $ time mpirun -np 1 --host bend002 ./hello > Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12 > > real 0m0.528s > user 0m0.021s > sys 0m0.023s > > Which is what I would have expected. With --host set to the local host, no > daemons are being launched and so the time is quite short (just spent mapping > and fork/exec). With --host set to a single remote host, you have the time it > takes Slurm to launch our daemon on the remote host, so you get about half of > a second. > > IIRC, you were having some problems with the OOB setup. If you specify the > TCP interface to use, does your time come down? > > > On Aug 26, 2014, at 8:32 AM, Timur Ismagilov wrote: > >> I'm using slurm 2.5.6 >> >> $salloc -N8 --exclusive -J ompi -p test >> >> $ srun hostname >> node1-128-21 >> node1-128-24 >> node1-128-22 >> node1-128-26 >> node1-128-27
Re: [OMPI users] long initialization
When i try to specify oob with --mca oob_tcp_if_include , i alwase get error: $ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c -- An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun. This could be caused by a number of factors, including an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). - Earlier, in ompi 1.8.1, I can not run mpi jobs without " --mca oob_tcp_if_include ib0 "... but now(ompi 1.9.a1) with this flag i get above error. Here is an output of ifconfig $ ifconfig eth1 Link encap:Ethernet HWaddr 00:15:17:EE:89:E1 inet addr:10.0.251.53 Bcast:10.0.251.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:215087433 errors:0 dropped:0 overruns:0 frame:0 TX packets:2648 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:26925754883 (25.0 GiB) TX bytes:137971 (134.7 KiB) Memory:b2c0-b2c2 eth2 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 inet addr:10.0.0.4 Bcast:10.0.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4892833125 errors:0 dropped:0 overruns:0 frame:0 TX packets:8708606918 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1823986502132 (1.6 TiB) TX bytes:11957754120037 (10.8 TiB) eth2.911 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 inet addr:93.180.7.38 Bcast:93.180.7.63 Mask:255.255.255.224 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3746454225 errors:0 dropped:0 overruns:0 frame:0 TX packets:1131917608 errors:0 dropped:3 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:285174723322 (265.5 GiB) TX bytes:11523163526058 (10.4 TiB) eth3 Link encap:Ethernet HWaddr 00:02:C9:04:73:F9 inet addr:10.2.251.14 Bcast:10.2.251.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:591156692 errors:0 dropped:56 overruns:56 frame:56 TX packets:679729229 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:324195989293 (301.9 GiB) TX bytes:770299202886 (717.3 GiB) Ifconfig uses the ioctl access method to get the full address information, which limits hardware addresses to 8 bytes. Because Infiniband address has 20 bytes, only the first 8 bytes are displayed correctly. Ifconfig is obsolete! For replacement check ip. ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:10.128.0.4 Bcast:10.128.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:10843859 errors:0 dropped:0 overruns:0 frame:0 TX packets:8089839 errors:0 dropped:15 overruns:0 carrier:0 collisions:0 txqueuelen:1024 RX bytes:939249464 (895.7 MiB) TX bytes:886054008 (845.0 MiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:31235107 errors:0 dropped:0 overruns:0 frame:0 TX packets:31235107 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:132750916041 (123.6 GiB) TX bytes:132750916041 (123.6 GiB) Tue, 26 Aug 2014 09:48:35 -0700 от Ralph Castain: >I think something may be messed up with your installation. I went ahead and >tested this on a Slurm 2.5.4 cluster, and got the following: > >$ time mpirun -np 1 --host bend001 ./hello >Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12 > >real 0m0.086s >user 0m0.039s >sys 0m0.046s > >$ time mpirun -np 1 --host bend002 ./hello >Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12 > >real 0m0.528s >user 0m0.021s >sys 0m0.023s > >Which is what I would have expected. With --host set to the local host, no >daemons are being launched and so the time is quite short (just spent mapping >and fork/exec). With --host set to a single remote host, you have the time it >takes Slurm to launch our daemon on the remote host, so you get about half of >a second. > >IIRC, you were having some problems with the OOB setup. If you specify the TCP >interface to use, does your time come down? > > >On Aug 26, 2014, at 8:32 AM, Timur Ismagilov < tismagi...@mail.ru > wrote: >>I'm using slurm 2.5.6 >> >>$salloc -N8 --exclusive -J ompi -p test >>$ srun hostname >>node1-128-21 >>node1-128-24 >>node1-128-22 >>node1-128-26 >>node1-128-27 >>node1-128-20 >>node1-128-25 >>node1-128-23 >>$ time mpirun -np 1 --host node1-128-21 ./hello_c >>Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI >>semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug >>21, 2014 (nightly snapshot tarball), 146) >>real 1m3.932s >>user 0m0.035s >>sys 0m0.072s >> >> >>Tue, 26 Aug 2014 07:03:58 -0700 от Ralph Castain <
Re: [OMPI users] long initialization
I think something may be messed up with your installation. I went ahead and tested this on a Slurm 2.5.4 cluster, and got the following: $ time mpirun -np 1 --host bend001 ./hello Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12 real0m0.086s user0m0.039s sys 0m0.046s $ time mpirun -np 1 --host bend002 ./hello Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12 real0m0.528s user0m0.021s sys 0m0.023s Which is what I would have expected. With --host set to the local host, no daemons are being launched and so the time is quite short (just spent mapping and fork/exec). With --host set to a single remote host, you have the time it takes Slurm to launch our daemon on the remote host, so you get about half of a second. IIRC, you were having some problems with the OOB setup. If you specify the TCP interface to use, does your time come down? On Aug 26, 2014, at 8:32 AM, Timur Ismagilovwrote: > I'm using slurm 2.5.6 > > $salloc -N8 --exclusive -J ompi -p test > > $ srun hostname > node1-128-21 > node1-128-24 > node1-128-22 > node1-128-26 > node1-128-27 > node1-128-20 > node1-128-25 > node1-128-23 > > $ time mpirun -np 1 --host node1-128-21 ./hello_c > Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI > semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug > 21, 2014 (nightly snapshot tarball), 146) > > real 1m3.932s > user 0m0.035s > sys 0m0.072s > > > > > Tue, 26 Aug 2014 07:03:58 -0700 от Ralph Castain : > hmmmwhat is your allocation like? do you have a large hostfile, for > example? > > if you add a --host argument that contains just the local host, what is the > time for that scenario? > > On Aug 26, 2014, at 6:27 AM, Timur Ismagilov wrote: > >> Hello! >> Here is my time results: >> >> $time mpirun -n 1 ./hello_c >> Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI >> semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug >> 21, 2014 (nightly snapshot tarball), 146) >> >> real 1m3.985s >> user 0m0.031s >> sys 0m0.083s >> >> >> >> >> Fri, 22 Aug 2014 07:43:03 -0700 от Ralph Castain : >> I'm also puzzled by your timing statement - I can't replicate it: >> >> 07:41:43 $ time mpirun -n 1 ./hello_c >> Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI rhc@bend001 >> Distribution, ident: 1.9a1r32577, repo rev: r32577, Unreleased developer >> copy, 125) >> >> real 0m0.547s >> user 0m0.043s >> sys 0m0.046s >> >> The entire thing ran in 0.5 seconds >> >> >> On Aug 22, 2014, at 6:33 AM, Mike Dubman wrote: >> >>> Hi, >>> The default delimiter is ";" . You can change delimiter with >>> mca_base_env_list_delimiter. >>> >>> >>> >>> On Fri, Aug 22, 2014 at 2:59 PM, Timur Ismagilov wrote: >>> Hello! >>> If i use latest night snapshot: >>> $ ompi_info -V >>> Open MPI v1.9a1r32570 >>> >>> In programm hello_c initialization takes ~1 min >>> In ompi 1.8.2rc4 and ealier it takes ~1 sec(or less) >>> if i use >>> $mpirun --mca mca_base_env_list 'MXM_SHM_KCOPY_MODE=off,OMP_NUM_THREADS=8' >>> --map-by slot:pe=8 -np 1 ./hello_c >>> i got error >>> config_parser.c:657 MXM ERROR Invalid value for SHM_KCOPY_MODE: >>> 'off,OMP_NUM_THREADS=8'. Expected: [off|knem|cma|autodetect] >>> but with -x all works fine (but with warn) >>> $mpirun -x MXM_SHM_KCOPY_MODE=off -x OMP_NUM_THREADS=8 -np 1 ./hello_c >>> WARNING: The mechanism by which environment variables are explicitly >>> .. >>> .. >>> .. >>> Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI >>> semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug >>> 21, 2014 (nightly snapshot tarball), 146) >>> >>> >>> Thu, 21 Aug 2014 06:26:13 -0700 от Ralph Castain : >>> Not sure I understand. The problem has been fixed in both the trunk and the >>> 1.8 branch now, so you should be able to work with either of those nightly >>> builds. >>> >>> On Aug 21, 2014, at 12:02 AM, Timur Ismagilov wrote: >>> Have i I any opportunity to run mpi jobs? Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain : yes, i know - it is cmr'd On Aug 20, 2014, at 10:26 AM, Mike Dubman wrote: > btw, we get same error in v1.8 branch as well. > > > On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain wrote: > It was not yet fixed - but should be now. > > On Aug 20, 2014, at 6:39 AM, Timur Ismagilov wrote: > >> Hello! >> >> As i can see, the bug is fixed, but in Open MPI v1.9a1r32516 i still >> have the problem >> >> a) >> $ mpirun -np 1 ./hello_c >> >>
Re: [OMPI users] long initialization
Hello! Here is my time results: $time mpirun -n 1 ./hello_c Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug 21, 2014 (nightly snapshot tarball), 146) real 1m3.985s user 0m0.031s sys 0m0.083s Fri, 22 Aug 2014 07:43:03 -0700 от Ralph Castain: >I'm also puzzled by your timing statement - I can't replicate it: > >07:41:43 $ time mpirun -n 1 ./hello_c >Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI rhc@bend001 >Distribution, ident: 1.9a1r32577, repo rev: r32577, Unreleased developer copy, >125) > >real 0m0.547s >user 0m0.043s >sys 0m0.046s > >The entire thing ran in 0.5 seconds > > >On Aug 22, 2014, at 6:33 AM, Mike Dubman < mi...@dev.mellanox.co.il > wrote: >>Hi, >>The default delimiter is ";" . You can change delimiter with >>mca_base_env_list_delimiter. >> >> >> >>On Fri, Aug 22, 2014 at 2:59 PM, Timur Ismagilov < tismagi...@mail.ru > >>wrote: >>>Hello! >>>If i use latest night snapshot: >>>$ ompi_info -V >>>Open MPI v1.9a1r32570 >>>* In programm hello_c initialization takes ~1 min >>>In ompi 1.8.2rc4 and ealier it takes ~1 sec(or less) >>>* if i use >>>$mpirun --mca mca_base_env_list 'MXM_SHM_KCOPY_MODE=off,OMP_NUM_THREADS=8' >>>--map-by slot:pe=8 -np 1 ./hello_c >>>i got error >>>config_parser.c:657 MXM ERROR Invalid value for SHM_KCOPY_MODE: >>>'off,OMP_NUM_THREADS=8'. Expected: [off|knem|cma|autodetect] >>>but with -x all works fine (but with warn) >>>$mpirun -x MXM_SHM_KCOPY_MODE=off -x OMP_NUM_THREADS=8 -np 1 ./hello_c >>>WARNING: The mechanism by which environment variables are explicitly >>>.. >>>.. >>>.. >>>Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI >>>semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug >>>21, 2014 (nightly snapshot tarball), 146) >>>Thu, 21 Aug 2014 06:26:13 -0700 от Ralph Castain < r...@open-mpi.org >: Not sure I understand. The problem has been fixed in both the trunk and the 1.8 branch now, so you should be able to work with either of those nightly builds. On Aug 21, 2014, at 12:02 AM, Timur Ismagilov < tismagi...@mail.ru > wrote: >Have i I any opportunity to run mpi jobs? > > >Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain < r...@open-mpi.org >: >>yes, i know - it is cmr'd >> >>On Aug 20, 2014, at 10:26 AM, Mike Dubman < mi...@dev.mellanox.co.il > >>wrote: >>>btw, we get same error in v1.8 branch as well. >>> >>> >>>On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain < r...@open-mpi.org > >>>wrote: It was not yet fixed - but should be now. On Aug 20, 2014, at 6:39 AM, Timur Ismagilov < tismagi...@mail.ru > wrote: >Hello! > >As i can see, the bug is fixed, but in Open MPI v1.9a1r32516 i still >have the problem > >a) >$ mpirun -np 1 ./hello_c >-- >An ORTE daemon has unexpectedly failed after launch and before >communicating back to mpirun. This could be caused by a number >of factors, including an inability to create a connection back >to mpirun due to a lack of common network interfaces and/or no >route found between them. Please check network connectivity >(including firewalls and network routing requirements). >-- >b) >$ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c >-- >An ORTE daemon has unexpectedly failed after launch and before >communicating back to mpirun. This could be caused by a number >of factors, including an inability to create a connection back >to mpirun due to a lack of common network interfaces and/or no >route found between them. Please check network connectivity >(including firewalls and network routing requirements). >-- > >c) > >$ mpirun --mca oob_tcp_if_include ib0 -debug-daemons --mca >plm_base_verbose 5 -mca oob_base_verbose 10 -mca rml_base_verbose 10 >-np 1 ./hello_c >[compiler-2:14673] mca:base:select:( plm) Querying component [isolated] >[compiler-2:14673] mca:base:select:( plm) Query of component >[isolated] set priority to 0 >[compiler-2:14673] mca:base:select:( plm) Querying component [rsh] >[compiler-2:14673] mca:base:select:( plm) Query of component [rsh] set >priority to 10 >[compiler-2:14673] mca:base:select:( plm) Querying component [slurm] >[compiler-2:14673] mca:base:select:( plm) Query of component
Re: [OMPI users] long initialization
I'm also puzzled by your timing statement - I can't replicate it: 07:41:43 $ time mpirun -n 1 ./hello_c Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI rhc@bend001 Distribution, ident: 1.9a1r32577, repo rev: r32577, Unreleased developer copy, 125) real0m0.547s user0m0.043s sys 0m0.046s The entire thing ran in 0.5 seconds On Aug 22, 2014, at 6:33 AM, Mike Dubmanwrote: > Hi, > The default delimiter is ";" . You can change delimiter with > mca_base_env_list_delimiter. > > > > On Fri, Aug 22, 2014 at 2:59 PM, Timur Ismagilov wrote: > Hello! > If i use latest night snapshot: > $ ompi_info -V > Open MPI v1.9a1r32570 > > In programm hello_c initialization takes ~1 min > In ompi 1.8.2rc4 and ealier it takes ~1 sec(or less) > if i use > $mpirun --mca mca_base_env_list 'MXM_SHM_KCOPY_MODE=off,OMP_NUM_THREADS=8' > --map-by slot:pe=8 -np 1 ./hello_c > i got error > config_parser.c:657 MXM ERROR Invalid value for SHM_KCOPY_MODE: > 'off,OMP_NUM_THREADS=8'. Expected: [off|knem|cma|autodetect] > but with -x all works fine (but with warn) > $mpirun -x MXM_SHM_KCOPY_MODE=off -x OMP_NUM_THREADS=8 -np 1 ./hello_c > WARNING: The mechanism by which environment variables are explicitly > .. > .. > .. > Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI > semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug > 21, 2014 (nightly snapshot tarball), 146) > > > Thu, 21 Aug 2014 06:26:13 -0700 от Ralph Castain : > Not sure I understand. The problem has been fixed in both the trunk and the > 1.8 branch now, so you should be able to work with either of those nightly > builds. > > On Aug 21, 2014, at 12:02 AM, Timur Ismagilov wrote: > >> Have i I any opportunity to run mpi jobs? >> >> >> Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain : >> yes, i know - it is cmr'd >> >> On Aug 20, 2014, at 10:26 AM, Mike Dubman wrote: >> >>> btw, we get same error in v1.8 branch as well. >>> >>> >>> On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain wrote: >>> It was not yet fixed - but should be now. >>> >>> On Aug 20, 2014, at 6:39 AM, Timur Ismagilov wrote: >>> Hello! As i can see, the bug is fixed, but in Open MPI v1.9a1r32516 i still have the problem a) $ mpirun -np 1 ./hello_c -- An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun. This could be caused by a number of factors, including an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -- b) $ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c -- An ORTE daemon has unexpectedly failed after launch and before communicating back to mpirun. This could be caused by a number of factors, including an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -- c) $ mpirun --mca oob_tcp_if_include ib0 -debug-daemons --mca plm_base_verbose 5 -mca oob_base_verbose 10 -mca rml_base_verbose 10 -np 1 ./hello_c [compiler-2:14673] mca:base:select:( plm) Querying component [isolated] [compiler-2:14673] mca:base:select:( plm) Query of component [isolated] set priority to 0 [compiler-2:14673] mca:base:select:( plm) Querying component [rsh] [compiler-2:14673] mca:base:select:( plm) Query of component [rsh] set priority to 10 [compiler-2:14673] mca:base:select:( plm) Querying component [slurm] [compiler-2:14673] mca:base:select:( plm) Query of component [slurm] set priority to 75 [compiler-2:14673] mca:base:select:( plm) Selected component [slurm] [compiler-2:14673] mca: base: components_register: registering oob components [compiler-2:14673] mca: base: components_register: found loaded component tcp [compiler-2:14673] mca: base: components_register: component tcp register function successful [compiler-2:14673] mca: base: components_open: opening oob components [compiler-2:14673] mca: base: components_open: found loaded component tcp [compiler-2:14673] mca: base: components_open: component
Re: [OMPI users] long initialization
Hi, The default delimiter is ";" . You can change delimiter with mca_base_env_list_delimiter. On Fri, Aug 22, 2014 at 2:59 PM, Timur Ismagilovwrote: > Hello! > If i use latest night snapshot: > > $ ompi_info -V > Open MPI v1.9a1r32570 > >1. In programm hello_c initialization takes ~1 min >In ompi 1.8.2rc4 and ealier it takes ~1 sec(or less) >2. if i use >$mpirun --mca mca_base_env_list >'MXM_SHM_KCOPY_MODE=off,OMP_NUM_THREADS=8' --map-by slot:pe=8 -np 1 >./hello_c >i got error >config_parser.c:657 MXM ERROR Invalid value for SHM_KCOPY_MODE: >'off,OMP_NUM_THREADS=8'. Expected: [off|knem|cma|autodetect] >but with -x all works fine (but with warn) >$mpirun -x MXM_SHM_KCOPY_MODE=off -x OMP_NUM_THREADS=8 -np 1 ./hello_c > >WARNING: The mechanism by which environment variables are explicitly >.. >.. >.. >Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI >semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, >Aug 21, 2014 (nightly snapshot tarball), 146) > > > Thu, 21 Aug 2014 06:26:13 -0700 от Ralph Castain : > > Not sure I understand. The problem has been fixed in both the trunk and > the 1.8 branch now, so you should be able to work with either of those > nightly builds. > > On Aug 21, 2014, at 12:02 AM, Timur Ismagilov > wrote: > > Have i I any opportunity to run mpi jobs? > > > Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain >: > > yes, i know - it is cmr'd > > On Aug 20, 2014, at 10:26 AM, Mike Dubman > wrote: > > btw, we get same error in v1.8 branch as well. > > > On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain wrote: > > It was not yet fixed - but should be now. > > On Aug 20, 2014, at 6:39 AM, Timur Ismagilov wrote: > > Hello! > > As i can see, the bug is fixed, but in Open MPI v1.9a1r32516 i still have > the problem > > a) > $ mpirun -np 1 ./hello_c > > -- > An ORTE daemon has unexpectedly failed after launch and before > communicating back to mpirun. This could be caused by a number > of factors, including an inability to create a connection back > to mpirun due to a lack of common network interfaces and/or no > route found between them. Please check network connectivity > (including firewalls and network routing requirements). > -- > > b) > $ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c > -- > An ORTE daemon has unexpectedly failed after launch and before > communicating back to mpirun. This could be caused by a number > of factors, including an inability to create a connection back > to mpirun due to a lack of common network interfaces and/or no > route found between them. Please check network connectivity > (including firewalls and network routing requirements). > -- > > c) > > $ mpirun --mca oob_tcp_if_include ib0 -debug-daemons --mca > plm_base_verbose 5 -mca oob_base_verbose 10 -mca rml_base_verbose 10 -np 1 > ./hello_c > > [compiler-2:14673] mca:base:select:( plm) Querying component [isolated] > [compiler-2:14673] mca:base:select:( plm) Query of component [isolated] > set priority to 0 > [compiler-2:14673] mca:base:select:( plm) Querying component [rsh] > [compiler-2:14673] mca:base:select:( plm) Query of component [rsh] set > priority to 10 > [compiler-2:14673] mca:base:select:( plm) Querying component [slurm] > [compiler-2:14673] mca:base:select:( plm) Query of component [slurm] set > priority to 75 > [compiler-2:14673] mca:base:select:( plm) Selected component [slurm] > [compiler-2:14673] mca: base: components_register: registering oob > components > [compiler-2:14673] mca: base: components_register: found loaded component > tcp > [compiler-2:14673] mca: base: components_register: component tcp register > function successful > [compiler-2:14673] mca: base: components_open: opening oob components > [compiler-2:14673] mca: base: components_open: found loaded component tcp > [compiler-2:14673] mca: base: components_open: component tcp open function > successful > [compiler-2:14673] mca:oob:select: checking available component tcp > [compiler-2:14673] mca:oob:select: Querying component [tcp] > [compiler-2:14673] oob:tcp: component_available called > [compiler-2:14673] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 > [compiler-2:14673] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4 > [compiler-2:14673] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4 > [compiler-2:14673] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4 > [compiler-2:14673] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4 > [compiler-2:14673]