Re: [OMPI users] Mac OS X Static PGI
On Mar 1, 2011, at 1:34 PM, David Robertson wrote: > Hi, > > > Error means OMPI didn't find a network interface - do you have your > > networks turned off? Sometimes people travel with Airport turned off. > > If you haven wire connected, then no interfaces exist. > > I am logged in to the machine remotely through the wired interface. The > Airport is always off. I have Open MPI built and running fine with gcc/ifort > and gcc/gfortran using shared libraries. I have compiled and run successfully > with both shared and static libraries with gcc/ifort. I have not tried the > static libraries with gfortran/gcc. > > ifconfig gives me: > > lo0: flags=8049mtu 16384 >inet6 ::1 prefixlen 128 >inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 >inet 127.0.0.1 netmask 0xff00 > gif0: flags=8010 mtu 1280 > stf0: flags=0<> mtu 1280 > en0: flags=8863 mtu 1500 >ether 10:9a:dd:55:bb:52 >inet6 fe80::129a:ddff:fe55:bb52%en0 prefixlen 64 scopeid 0x4 >inet 192.168.30.13 netmask 0xc000 broadcast 192.168.63.255 >media: autoselect (1000baseT ) >status: active > fw0: flags=8863 mtu 4078 >lladdr 70:cd:60:ff:fe:2f:01:8e >media: autoselect >status: inactive > en1: flags=8863 mtu 1500 >ether c8:bc:c8:c9:fc:a9 >media: autoselect () >status: inactive > vnic0: flags=8843 mtu 1500 >ether 00:1c:42:00:00:08 >inet 10.211.55.2 netmask 0xff00 broadcast 10.211.55.255 >media: autoselect >status: active > vnic1: flags=8843 mtu 1500 >ether 00:1c:42:00:00:09 >inet 10.37.129.2 netmask 0xff00 broadcast 10.37.129.255 >media: autoselect >status: active > vboxnet0: flags=8842 mtu 1500 >ether 0a:00:27:00:00:00 > > Are you saying that Open MPI is only looking for the Airport (en1) card and > not en0? No, it isn't. However, what the error message says is as I indicated - it is failing because it is getting an error when trying to open a port on an available network. I can't debug your network to find out why. I know that Mac doesn't really like (nor does Apple really support) static builds, and it has been a long time since I have built it that way on my Mac. Looking at my old static config file, I don't see anything special in it. That said, I know we had some early problems with static builds on the Mac (like I said, Apple doesn't really support it). Those were solved, though, and none of those problems had this symptom. Could be something strange about PGI and socket libs when running static, but I wouldn't know - I don't use PGI. Sorry I can't be of help - I suggest asking PGI about issues re socket support with their compiler on the Mac, or not using PGI if they only support static builds given Apple's lack of support for that mode of operation on the Mac (seems bizarre that PGI would demand it). > Why would it do that for PGI only? It doesn't, nor does it care what compiler is used. > > Thanks, > Dave > > > On Mar 1, 2011, at 11:50 AM, David Robertson wrote: > > > Hi all, > > > > I am having trouble with PGI on Mac OS X 10.6.6. PGI's support staff has > > informed me that PGI does not "support 64-bit shared library creation" on > > the Mac. Therefore, I have built Open MPI in static only mode > > (--disable-shared --enable-static). > > > > I have to do some manipulation to get my application to pass the final > > linking stage (more on that at the bottom) but I get an immediate crash at > > runtime: > > > > > > start of output > > bash-3.2$ mpirun -np 4 oceanG ocean_upwelling.in > > [flask.marine.rutgers.edu:14186] opal_ifinit: unable to find network > > interfaces. > > [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in > > file ess_hnp_module.c at line 181 > > -- > > It looks like orte_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during orte_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > orte_rml_base_select failed > > --> Returned value Error (-1) instead of ORTE_SUCCESS > > -- > > [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in > > file runtime/orte_init.c at line 132 > >
[OMPI users] MPI_ALLREDUCE bug with 1.5.2rc3r24441
Hi, there appears to be a regression in revision 1.5.2rc3r24441. The attached program crashes even with 1 PE with: Default real, digits: 4 24 Real kind,digits: 8 53 Integer kind, bits: 8 64 Default Integer : 4 32 Sum[real]: 1.000 2.000 3.000 Sum[real(8)]: 1.2. 3. Sum[integer(4)]: 1 2 3 [proton:24826] *** An error occurred in MPI_Allreduce: the reduction operation MPI_SUM is not defined on the MPI_INTEGER8 datatype On the other hand, % ompi_info --arch Configured architecture: i686-pc-linux-gnu % ompi_info --all |grep 'integer[48]' Fort have integer4: yes Fort have integer8: yes Fort integer4 size: 4 Fort integer8 size: 8 Fort integer4 align: 4 Fort integer8 align: 8 There are no problems with 1.4.x and earlier revisions. program test use mpi implicit none integer, parameter :: i8 = selected_int_kind (15) integer, parameter :: r8 = selected_real_kind (15,90) integer, parameter :: N = 3 integer :: i4i(N), i4s(N) integer(i8) :: i8i(N), i8s(N) real:: r4i(N), r4s(N) real(r8):: r8i(N), r8s(N) integer :: ierr, nproc, myrank, i i4i = (/ (i, i=1,N) /); i8i = (/ (i, i=1,N) /) r4i = (/ (i, i=1,N) /); r8i = (/ (i, i=1,N) /) call MPI_INIT (ierr) call MPI_COMM_SIZE (MPI_COMM_WORLD, nproc, ierr) call MPI_COMM_RANK (MPI_COMM_WORLD, myrank, ierr) if (myrank == 0) then print *, "Default real, digits:", kind (1.0), digits (1.0) print *, "Real kind,digits:", r8, digits (1._r8) print *, "Integer kind, bits:", i8, bit_size (1_i8) print *, "Default Integer :", kind (1), bit_size (1) end if call MPI_ALLREDUCE (r4i, r4s, N, MPI_REAL, MPI_SUM, MPI_COMM_WORLD, ierr) if (myrank == 0)print *, "Sum[real]:", r4s call MPI_ALLREDUCE (r8i, r8s, N, MPI_REAL8,MPI_SUM, MPI_COMM_WORLD, ierr) if (myrank == 0)print *, "Sum[real(8)]:", r8s call MPI_ALLREDUCE (i4i, i4s, N, MPI_INTEGER4, MPI_SUM, MPI_COMM_WORLD, ierr) if (myrank == 0)print *, "Sum[integer(4)]:", i4s call MPI_ALLREDUCE (i8i, i8s, N, MPI_INTEGER8, MPI_SUM, MPI_COMM_WORLD, ierr) if (myrank == 0)print *, "Sum[integer(8)]:", i8s call MPI_FINALIZE (ierr) end program test
Re: [OMPI users] Mac OS X Static PGI
Hi, > Error means OMPI didn't find a network interface - do you have your > networks turned off? Sometimes people travel with Airport turned off. > If you haven wire connected, then no interfaces exist. I am logged in to the machine remotely through the wired interface. The Airport is always off. I have Open MPI built and running fine with gcc/ifort and gcc/gfortran using shared libraries. I have compiled and run successfully with both shared and static libraries with gcc/ifort. I have not tried the static libraries with gfortran/gcc. ifconfig gives me: lo0: flags=8049mtu 16384 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 inet 127.0.0.1 netmask 0xff00 gif0: flags=8010 mtu 1280 stf0: flags=0<> mtu 1280 en0: flags=8863 mtu 1500 ether 10:9a:dd:55:bb:52 inet6 fe80::129a:ddff:fe55:bb52%en0 prefixlen 64 scopeid 0x4 inet 192.168.30.13 netmask 0xc000 broadcast 192.168.63.255 media: autoselect (1000baseT ) status: active fw0: flags=8863 mtu 4078 lladdr 70:cd:60:ff:fe:2f:01:8e media: autoselect status: inactive en1: flags=8863 mtu 1500 ether c8:bc:c8:c9:fc:a9 media: autoselect () status: inactive vnic0: flags=8843 mtu 1500 ether 00:1c:42:00:00:08 inet 10.211.55.2 netmask 0xff00 broadcast 10.211.55.255 media: autoselect status: active vnic1: flags=8843 mtu 1500 ether 00:1c:42:00:00:09 inet 10.37.129.2 netmask 0xff00 broadcast 10.37.129.255 media: autoselect status: active vboxnet0: flags=8842 mtu 1500 ether 0a:00:27:00:00:00 Are you saying that Open MPI is only looking for the Airport (en1) card and not en0? Why would it do that for PGI only? Thanks, Dave On Mar 1, 2011, at 11:50 AM, David Robertson wrote: > Hi all, > > I am having trouble with PGI on Mac OS X 10.6.6. PGI's support staff has informed me that PGI does not "support 64-bit shared library creation" on the Mac. Therefore, I have built Open MPI in static only mode (--disable-shared --enable-static). > > I have to do some manipulation to get my application to pass the final linking stage (more on that at the bottom) but I get an immediate crash at runtime: > > > start of output > bash-3.2$ mpirun -np 4 oceanG ocean_upwelling.in > [flask.marine.rutgers.edu:14186] opal_ifinit: unable to find network interfaces. > [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 181 > -- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_rml_base_select failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -- > [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132 > -- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -- > [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in file orterun.c at line 543 > end of output > > > When I google for this error the only result I find is for a patch to version 1.1.2 which doesn't even resemble the current state of the Open MPI code. > > iMac info: > > ProductName: Mac OS X > ProductVersion: 10.6.6 > BuildVersion: 10J567 > > Has anyone seen this before or have an idea what to try? > > Thanks, > Dave > > P.S. I get the same results with Open MPI configured with: > > ./configure --prefix=/opt/pgisoft/openmpi/openmpi-1.4.3 CC=pgcc CXX=pgcpp F77=pgf77 FC=pgf90 --enable-mpirun-prefix-by-default --disable-shared --enable-static --without-memory-manager
Re: [OMPI users] Mac OS X Static PGI
Error means OMPI didn't find a network interface - do you have your networks turned off? Sometimes people travel with Airport turned off. If you haven wire connected, then no interfaces exist. Sent from my iPad On Mar 1, 2011, at 11:50 AM, David Robertsonwrote: > Hi all, > > I am having trouble with PGI on Mac OS X 10.6.6. PGI's support staff has > informed me that PGI does not "support 64-bit shared library creation" on the > Mac. Therefore, I have built Open MPI in static only mode (--disable-shared > --enable-static). > > I have to do some manipulation to get my application to pass the final > linking stage (more on that at the bottom) but I get an immediate crash at > runtime: > > > start of output > bash-3.2$ mpirun -np 4 oceanG ocean_upwelling.in > [flask.marine.rutgers.edu:14186] opal_ifinit: unable to find network > interfaces. > [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in file > ess_hnp_module.c at line 181 > -- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_rml_base_select failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -- > [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 132 > -- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > orte_ess_set_name failed > --> Returned value Error (-1) instead of ORTE_SUCCESS > -- > [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in file > orterun.c at line 543 > end of output > > > When I google for this error the only result I find is for a patch to version > 1.1.2 which doesn't even resemble the current state of the Open MPI code. > > iMac info: > > ProductName:Mac OS X > ProductVersion: 10.6.6 > BuildVersion: 10J567 > > Has anyone seen this before or have an idea what to try? > > Thanks, > Dave > > P.S. I get the same results with Open MPI configured with: > > ./configure --prefix=/opt/pgisoft/openmpi/openmpi-1.4.3 CC=pgcc CXX=pgcpp > F77=pgf77 FC=pgf90 --enable-mpirun-prefix-by-default --disable-shared > --enable-static --without-memory-manager --without-libnuma --disable-ipv6 > --disable-io-romio --disable-heterogeneous --enable-mpi-f77 --enable-mpi-f90 > --enable-mpi-profile > > and > > ./configure --prefix=/opt/pgisoft/openmpi/openmpi-1.4.3 CC=pgcc CXX=pgcpp > F77=pgf77 FC=pgf90 --disable-shared --enable-static > > > > P.P.S. Linking workarounds: > > Snow Leopard ships with Open MPI libraries that interfere when linking > programs built with my compiled mpif90. The problem is that 'ld' searches > every directory in the search path for shared objects before it will look for > static archives. That means a line like: > > pgf90 x.o -o a.out -L/opt/openmpi/lib -lmpi_f90 -lmpi_f77 -lmpi > > will use the .a file in /opt/openmpi/lib because Snow Leopard doesn't ship > with Fortran bindings but when it gets to -lmpi it picks up the libmpi.dylib > from /usr/lib and causes undefined references. Note the line above is > inferred using the -show:link option to mpif90. > > I have found two workarounds to this. Edit the > share/openmpi/mpif90-wrapper-data.txt file to have full paths to the static > libraries (this is what the PGI shipped version of Open MPI does). The other > option is to add the line: > > switch -search_paths_first is replace(-search_paths_first) positional(linker); > > to the /path/to/pgi/bin/siterc file and set LDFLAGS to -search_paths_first in > my application. > > from the ld manpage: > > -search_paths_first > By default the -lx and -weak-lx options first search for a file > of the form `libx.dylib' in each directory in the library search > path, then a file of the form `libx.a' is searched for in the > library search paths. This option changes it so that in each > path `libx.dylib' is searched for then `libx.a' before the next > path in the library search path is searched. >
[OMPI users] Mac OS X Static PGI
Hi all, I am having trouble with PGI on Mac OS X 10.6.6. PGI's support staff has informed me that PGI does not "support 64-bit shared library creation" on the Mac. Therefore, I have built Open MPI in static only mode (--disable-shared --enable-static). I have to do some manipulation to get my application to pass the final linking stage (more on that at the bottom) but I get an immediate crash at runtime: start of output bash-3.2$ mpirun -np 4 oceanG ocean_upwelling.in [flask.marine.rutgers.edu:14186] opal_ifinit: unable to find network interfaces. [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in file ess_hnp_module.c at line 181 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_rml_base_select failed --> Returned value Error (-1) instead of ORTE_SUCCESS -- [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 132 -- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed --> Returned value Error (-1) instead of ORTE_SUCCESS -- [flask.marine.rutgers.edu:14186] [[65522,0],0] ORTE_ERROR_LOG: Error in file orterun.c at line 543 end of output When I google for this error the only result I find is for a patch to version 1.1.2 which doesn't even resemble the current state of the Open MPI code. iMac info: ProductName:Mac OS X ProductVersion: 10.6.6 BuildVersion: 10J567 Has anyone seen this before or have an idea what to try? Thanks, Dave P.S. I get the same results with Open MPI configured with: ./configure --prefix=/opt/pgisoft/openmpi/openmpi-1.4.3 CC=pgcc CXX=pgcpp F77=pgf77 FC=pgf90 --enable-mpirun-prefix-by-default --disable-shared --enable-static --without-memory-manager --without-libnuma --disable-ipv6 --disable-io-romio --disable-heterogeneous --enable-mpi-f77 --enable-mpi-f90 --enable-mpi-profile and ./configure --prefix=/opt/pgisoft/openmpi/openmpi-1.4.3 CC=pgcc CXX=pgcpp F77=pgf77 FC=pgf90 --disable-shared --enable-static P.P.S. Linking workarounds: Snow Leopard ships with Open MPI libraries that interfere when linking programs built with my compiled mpif90. The problem is that 'ld' searches every directory in the search path for shared objects before it will look for static archives. That means a line like: pgf90 x.o -o a.out -L/opt/openmpi/lib -lmpi_f90 -lmpi_f77 -lmpi will use the .a file in /opt/openmpi/lib because Snow Leopard doesn't ship with Fortran bindings but when it gets to -lmpi it picks up the libmpi.dylib from /usr/lib and causes undefined references. Note the line above is inferred using the -show:link option to mpif90. I have found two workarounds to this. Edit the share/openmpi/mpif90-wrapper-data.txt file to have full paths to the static libraries (this is what the PGI shipped version of Open MPI does). The other option is to add the line: switch -search_paths_first is replace(-search_paths_first) positional(linker); to the /path/to/pgi/bin/siterc file and set LDFLAGS to -search_paths_first in my application. from the ld manpage: -search_paths_first By default the -lx and -weak-lx options first search for a file of the form `libx.dylib' in each directory in the library search path, then a file of the form `libx.a' is searched for in the library search paths. This option changes it so that in each path `libx.dylib' is searched for then `libx.a' before the next path in the library search path is searched.
Re: [OMPI users] RDMACM Differences
On Feb 28, 2011, at 12:49 PM, Jagga Soorma wrote: > -bash-3.2$ mpiexec --mca btl openib,self -mca btl_openib_warn_default_gid_ > prefix 0 -np 2 --hostfile mpihosts > /home/jagga/osu-micro-benchmarks-3.3/openmpi/ofed-1.5.2/bin/osu_latency Your use of btl_openib_warn_default_gid_prefix may have brought up a subtle issue in Open MPI's verbs support. More below. > # OSU MPI Latency Test v3.3 > # SizeLatency (us) > [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all] > error modifing QP to RTR errno says Invalid argument > [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:815:rml_recv_cb] > error in endpoint reply start connect Looking at this error message and your ibv_devinfo output: > [root@amber03 ~]# ibv_devinfo > hca_id:mlx4_0 > transport:InfiniBand (0) > fw_ver:2.7.9294 > node_guid:78e7:d103:0021:8884 > sys_image_guid:78e7:d103:0021:8887 > vendor_id:0x02c9 > vendor_part_id:26438 > hw_ver:0xB0 > board_id:HP_020003 > phys_port_cnt:2 > port:1 > state:PORT_ACTIVE (4) > max_mtu:2048 (4) > active_mtu:2048 (4) > sm_lid:1 > port_lid:20 > port_lmc:0x00 > link_layer:IB > > port:2 > state:PORT_ACTIVE (4) > max_mtu:2048 (4) > active_mtu:1024 (3) > sm_lid:0 > port_lid:0 > port_lmc:0x00 > link_layer:Ethernet It looks like you have 1 HCA port as IB and the other at Ethernet. I'm wondering if OMPI is not taking the device transport into account and is *only* using the subnet ID to determine reachability (i.e., I'm wondering if we didn't anticipate multiple devices/ports with the same subnet ID but with different transports). I pointed this out to Mellanox yesterday; I think they're following up on it. In the meantime, a workaround might be to set a non-default subnet ID on your IB network. That should allow Open MPI to tell these networks apart without additional help. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Unknown overhead in "mpirun -am ft-enable-cr"
I have not had the time to look into the performance problem yet, and probably won't for a little while. Can you send me a small program that illustrates the performance problem, and I'll file a bug so we don't lose track of it. Thanks, Josh On Feb 25, 2011, at 1:31 PM, Nguyen Toan wrote: > Dear Josh, > > Did you find out the problem? I still cannot progress anything. > Hope to hear some good news from you. > > Regards, > Nguyen Toan > > On Sun, Feb 13, 2011 at 3:04 PM, Nguyen Toanwrote: > Hi Josh, > > I tried the MCA parameter you mentioned but it did not help, the unknown > overhead still exists. > Here I attach the output of 'ompi_info', both version 1.5 and 1.5.1. > Hope you can find out the problem. > Thank you. > > Regards, > Nguyen Toan > > On Wed, Feb 9, 2011 at 11:08 PM, Joshua Hursey wrote: > It looks like the logic in the configure script is turning on the FT thread > for you when you specify both '--with-ft=cr' and '--enable-mpi-threads'. > > Can you send me the output of 'ompi_info'? Can you also try the MCA parameter > that I mentioned earlier to see if that changes the performance? > > I there are many non-blocking sends and receives, there might be performance > bug with the way the point-to-point wrapper is tracking request objects. If > the above MCA parameter does not help the situation, let me know and I might > be able to take a look at this next week. > > Thanks, > Josh > > On Feb 9, 2011, at 1:40 AM, Nguyen Toan wrote: > > > Hi Josh, > > Thanks for the reply. I did not use the '--enable-ft-thread' option. Here > > is my build options: > > > > CFLAGS=-g \ > > ./configure \ > > --with-ft=cr \ > > --enable-mpi-threads \ > > --with-blcr=/home/nguyen/opt/blcr \ > > --with-blcr-libdir=/home/nguyen/opt/blcr/lib \ > > --prefix=/home/nguyen/opt/openmpi \ > > --with-openib \ > > --enable-mpirun-prefix-by-default > > > > My application requires lots of communication in every loop, focusing on > > MPI_Isend, MPI_Irecv and MPI_Wait. Also I want to make only one checkpoint > > per application execution for my purpose, but the unknown overhead exists > > even when no checkpoint was taken. > > > > Do you have any other idea? > > > > Regards, > > Nguyen Toan > > > > > > On Wed, Feb 9, 2011 at 12:41 AM, Joshua Hursey > > wrote: > > There are a few reasons why this might be occurring. Did you build with the > > '--enable-ft-thread' option? > > > > If so, it looks like I didn't move over the thread_sleep_wait adjustment > > from the trunk - the thread was being a bit too aggressive. Try adding the > > following to your command line options, and see if it changes the > > performance. > > "-mca opal_cr_thread_sleep_wait 1000" > > > > There are other places to look as well depending on how frequently your > > application communicates, how often you checkpoint, process layout, ... But > > usually the aggressive nature of the thread is the main problem. > > > > Let me know if that helps. > > > > -- Josh > > > > On Feb 8, 2011, at 2:50 AM, Nguyen Toan wrote: > > > > > Hi all, > > > > > > I am using the latest version of OpenMPI (1.5.1) and BLCR (0.8.2). > > > I found that when running an application,which uses MPI_Isend, MPI_Irecv > > > and MPI_Wait, > > > enabling C/R, i.e using "-am ft-enable-cr", the application runtime is > > > much longer than the normal execution with mpirun (no checkpoint was > > > taken). > > > This overhead becomes larger when the normal execution runtime is longer. > > > Does anybody have any idea about this overhead, and how to eliminate it? > > > Thanks. > > > > > > Regards, > > > Nguyen > > > ___ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > Joshua Hursey > > Postdoctoral Research Associate > > Oak Ridge National Laboratory > > http://users.nccs.gov/~jjhursey > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > Joshua Hursey > Postdoctoral Research Associate > Oak Ridge National Laboratory > http://users.nccs.gov/~jjhursey > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey
Re: [OMPI users] Basic question on portability
Yes, you will have problems. We did not formally introduce ABI compatibility until version 1.3.2. Meaning: your application compiled with 1.3.2 will successfully link/run against any 1.3.x version >= 1.3.2, and against any 1.4.x version. v1.5 broke ABI with the v1.3/v1.4 series, but it will also be stable for the duration of the v1.5/v1.6 series. We have no definite plans yet for v1.7, but it is likely that the ABI story will be the same there, too -- break from v1.5/v1.6 and stable for v1.7/v1.8. On Mar 1, 2011, at 11:25 AM, Blosch, Edwin L wrote: > If I compile against OpenMPI 1.2.8, shared linkage, on one system, then move > the executable to another system with OpenMPI 1.4.x or 1.5.x, will I have any > problems running the executable? > > Thanks > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] using MPI through Qt
Eye RCS 51 wrote: Hi, In an effort to make a Qt gui using MPI, I have the following: 1. Gui started in master node. 2. In Gui, through a pushbutton, a global variable x is assigned some value; let say, x=1000; 3. I want this value to be know to all nodes. So I used broadcast in the function assigning it on the master node and all other nodes. 4. I printed values of x, which prints all 1000 in all nodes. 5. Now control has reached to MPI_Finalize in all nodes except master. Now If I want to reassign value of x using pushbutton in master node and again broadcast to and print in all nodes, can it be done?? Not with MPI if MPI_Finalize has been called. I mean, can I have an MPI function which through GUI is called many times and assigns and prints WHILE program is running. You can call an MPI function like MPI_Bcast many times. E.g., MPI_Init(); MPI_Comm_rank(...,); while (...) { if ( myrank == MASTER ) x = ...; MPI_Bcast(,...); } MPI_Finalize(); There are many helpful MPI tutorials that can be found on the internet. OR simply can I have a print function which is printing noderank value in all nodes whenever pushbutton is pressed while program is running. command i used is "mpirun -np 3 ./a.out".
[OMPI users] Basic question on portability
If I compile against OpenMPI 1.2.8, shared linkage, on one system, then move the executable to another system with OpenMPI 1.4.x or 1.5.x, will I have any problems running the executable? Thanks
Re: [OMPI users] using MPI through Qt
Certainly you may call MPI functions many times, the problem is that you need to have matching receives (or collectives) at your slave nodes, which is only determined at run-time. Perhaps this could be done with two communications, the first broadcast the type of communications to the slaves (for example, 1 for collective broadcast, 2 for scatter, etc.), you encode whatever you wish in an integer. Once the slaves receive the code they'll respond correspondingly, posting the corresponding MPI receive. Clearly, a way to allow the slaves to exit the while loop is needed if you want the slaves to exit cleanly, the exit code can also be encoded in the integer you sent out. On Tue, Mar 1, 2011 at 12:39 AM, Eye RCS 51wrote: > Hi, > > In an effort to make a Qt gui using MPI, I have the following: > > 1. Gui started in master node. > > 2. In Gui, through a pushbutton, a global variable x is assigned some > value; let say, x=1000; > > 3. I want this value to be know to all nodes. So I used broadcast in the > function assigning it on the master node and all other nodes. > > 4. I printed values of x, which prints all 1000 in all nodes. > > 5. Now control has reached to MPI_Finalize in all nodes except master. > > Now If I want to reassign value of x using pushbutton in master node and > again broadcast to and print in all nodes, can it be done?? > I mean, can I have an MPI function which through GUI is called many times > and assigns and prints WHILE program is running. > > OR simply can I have a print function which is printing noderank value in > all nodes whenever pushbutton is pressed while program is running. > > command i used is "mpirun -np 3 ./a.out". > > Any help will be appreciated. > Thanks you very much. > > -- > eye51 > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- David Zhang University of California, San Diego
Re: [OMPI users] RoCE (IBoE) & OpenMPI
I thought you mentioned in a prior email that you had gotten one or two other OFED sample applications to work properly. How are they setting the SL? Are they not using the RDMA CM? On Mar 1, 2011, at 7:35 AM, Michael Shuey wrote: > So, since RoCE has no SM, and setting an SL is required to get > lossless ethernet on Cisco switches (and possibly others), does this > mean that RoCE will never work correctly with OpenMPI on Cisco > hardware? > > -- > Mike Shuey > > > > On Tue, Mar 1, 2011 at 3:42 AM, Doron Shohamwrote: >> Hi, >> >> Regarding to using a specific SL with RDMA CM, I've checked in the code and >> it seems that RDMA_CM uses the SL from the SA. >> So if you want to configure a specific SL, you need to do it via the SM. >> >> Doron >> >> -Original Message- >> From: Jeff Squyres [mailto:jsquy...@cisco.com] >> Sent: Thursday, February 24, 2011 3:45 PM >> To: Michael Shuey >> Cc: Open MPI Users , Mike Dubman >> Subject: Re: [OMPI users] RoCE (IBoE) & OpenMPI >> >> On Feb 24, 2011, at 8:00 AM, Michael Shuey wrote: >> >>> Late yesterday I did have a chance to test the patch Jeff provided >>> (against 1.4.3 - testing 1.5.x is on the docket for today). While it >>> works, in that I can specify a gid_index, >> >> Great! I'll commit that to the trunk and start the process of moving it to >> the v1.5.x series (I know you haven't tested it yet, but it's essentially >> the same patch, just slightly adjusted for each of the 3 branches). >> >>> it doesn't do everything >>> required - my traffic won't match a lossless CoS on the ethernet >>> switch. Specifying a GID is only half of it; I really need to also >>> specify a service level. >> >> RoCE requires the use of the RDMA CM (I think?), and I didn't think there >> was a way to request a specific SL via the RDMA CM...? (I could certainly >> be wrong here) >> >> I think Mellanox will need to follow up with these questions... >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] RoCE (IBoE) & OpenMPI
So, since RoCE has no SM, and setting an SL is required to get lossless ethernet on Cisco switches (and possibly others), does this mean that RoCE will never work correctly with OpenMPI on Cisco hardware? -- Mike Shuey On Tue, Mar 1, 2011 at 3:42 AM, Doron Shohamwrote: > Hi, > > Regarding to using a specific SL with RDMA CM, I've checked in the code and > it seems that RDMA_CM uses the SL from the SA. > So if you want to configure a specific SL, you need to do it via the SM. > > Doron > > -Original Message- > From: Jeff Squyres [mailto:jsquy...@cisco.com] > Sent: Thursday, February 24, 2011 3:45 PM > To: Michael Shuey > Cc: Open MPI Users , Mike Dubman > Subject: Re: [OMPI users] RoCE (IBoE) & OpenMPI > > On Feb 24, 2011, at 8:00 AM, Michael Shuey wrote: > >> Late yesterday I did have a chance to test the patch Jeff provided >> (against 1.4.3 - testing 1.5.x is on the docket for today). While it >> works, in that I can specify a gid_index, > > Great! I'll commit that to the trunk and start the process of moving it to > the v1.5.x series (I know you haven't tested it yet, but it's essentially the > same patch, just slightly adjusted for each of the 3 branches). > >> it doesn't do everything >> required - my traffic won't match a lossless CoS on the ethernet >> switch. Specifying a GID is only half of it; I really need to also >> specify a service level. > > RoCE requires the use of the RDMA CM (I think?), and I didn't think there was > a way to request a specific SL via the RDMA CM...? (I could certainly be > wrong here) > > I think Mellanox will need to follow up with these questions... > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] using MPI through Qt
Hi, In an effort to make a Qt gui using MPI, I have the following: 1. Gui started in master node. 2. In Gui, through a pushbutton, a global variable x is assigned some value; let say, x=1000; 3. I want this value to be know to all nodes. So I used broadcast in the function assigning it on the master node and all other nodes. 4. I printed values of x, which prints all 1000 in all nodes. 5. Now control has reached to MPI_Finalize in all nodes except master. Now If I want to reassign value of x using pushbutton in master node and again broadcast to and print in all nodes, can it be done?? I mean, can I have an MPI function which through GUI is called many times and assigns and prints WHILE program is running. OR simply can I have a print function which is printing noderank value in all nodes whenever pushbutton is pressed while program is running. command i used is "mpirun -np 3 ./a.out". Any help will be appreciated. Thanks you very much. -- eye51