[OMPI users] Configure fails with icc 10.1.008
Hello all, I am unable to get past ./configure as ICC fails on C++ tests (see attached ompi-output.tar.gz). Configure was called without and the with sourcing `/opt/intel/cc/10.1.xxx/bin/iccvars.sh` as per one of the invocation options in icc's doc. I was unable to find the relevant (well..intelligible for me that is ;P ) cause of the failure in config.log. Any help would be appreciated. Thanks, Eric Thibodeau ompi-output.tar.gz Description: application/gzip
Re: [OMPI users] arch question: long running app
Jeff, Thanks for the detailed discussion. It certainly makes things a lot clearer, just as I was giving up my hopes for a reply. The app is fairly heavy on communication (~10k messages per minute) and is also embarrassingly parallel. Taking this into account, I think I'll readjust my resilience expectations and go with MPI as it will make communications a breeze to deal with. It does make sense to have the ability to add/remove processes on the go. In a multi-core hardware a scheduler could add more processes to an app as the hardware becomes freed up from other tasks. Of course that would be a problem for apps that require some type of data synchronisation (tightly coupled as you say). It would be nice to have the option of "mpirun -min 4 -max 16" and let the scheduler optimise based on availability. I'm currently running a test case on two machines with two cores each and, after one day, so far so good. We'll see how it goes. Thanks again dok On Dec 6, 2007 2:06 PM, Jeff Squyreswrote: > It certainly does make sense to use MPI for such a setup. But there > are some important things to consider: > > 1. MPI, at its heart, is a communications system. There's lots of > other bells and whistles (e.g., starting up a whole bunch of processes > in tandem), but at the core: it's all about passing messages. > > 2. MPI tends to lend itself to a fairly tightly coupled systems. The > usual model is that you start all of your parallel processes at the > same time (e.g., "mpirun -np 32 my_application"). The current state > of technology is *not* good in terms of fault tolerance -- most MPI's > (Open MPI included) will kill the entire job if any one of those > processes die. This is an important factor for running for weeks, > months, or years. > > (lots of good research is ongoing about fault tolerance and MPI, but > the existing solutions are still emphasizing tightly-coupled > applications or required a bunch of involvement from the application) > > 3. MPI also emphasizes performance: low latency, high bandwidth, good > concurrency, etc. > > If you don't need these things, for example, if your communication > between manager and worker is infrequent, and/or the overall > application time is not dominated by communication time, you might be > better served for [extremely] long-running applications by using a > simple (but resilient) sockets-based communication layer and not using > MPI. I say this mainly because of the fault tolerance issues involved > and the natural hardware MTBF values that we see on today's hardware. > > Hope that helps. > > > On Dec 4, 2007, at 1:15 PM, doktora v wrote: > > > Hi, although I did my due diligence on searching for this question, > > I apologise if this is a repeat. > > > > From an architectural point of view does it make sense to use MPI in > > the following scenario (for the purposes of resilience as much as > > parallelization): > > > > Each process is a long-running process (runs non-interrupted for > > weeks, months or even years) that collects and crunches some > > streaming data, for example temperature readings, and the data is > > replicated to R nodes. > > > > Because this is a diversion from the normal modus operandi (i.e. all > > data is immediately available), is there any obvious MPI issues that > > I am not considering in designing such an application? > > > > Here is a more detailed description of the app: > > > > A master receives the data and dispatches it according to some > > function such that each tuple is replicated R times to R of the N > > nodes (with R<=N). Suppose that there are K regions from which > > temperature readings stream in in the form of where K is the > > region id and T is the temperature reading. The master sends > > to R of the N nodes. These nodes maintain a long-term state of, say, > > the min/max readings. If R=N=2, the system is basically duplicated > > and if one of the two nodes dies inadvertently, the other one still > > has accounted for all the data. > > > > Here is some pseudo-code: > > > > int main(argc, argv) > > > > int N=10, R=3, K=200; > > > > Init(argc,argv); > > int rank=COMM_WORLD.Get_rank(); > > if(rank==0) { > > int lastnode = 1; > > while(read from socket) > >for(i in 0:R) COMM_WORLD.Send( ,1,tuple,++lastnode%N,tag); > > } else { > > COMM_WORLD.Recv( ,1,tuple,any,tag,Info); > >process_message( ); > > } > > > > Many thanks for your time! > > Regards > > Dok > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > Cisco Systems > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Using mtrace with openmpi segfaults
On Dec 6, 2007, at 10:14 AM, Sajjad Tabib wrote: Is it possible to disable ptmalloc2 at runtime by disabling the component? Nope -- this one has to be compiled and linked in ahead of time. Sorry. :-\ -- Jeff Squyres Cisco Systems
Re: [OMPI users] Using mtrace with openmpi segfaults
Is there a way to disable this at runtime? Also can an user app use mallopt options without interfering with the memory managers? We have these options set but are getting memory corruption that moves around realloc in the program. mallopt(M_MMAP_MAX, 0); mallopt(M_TRIM_THRESHOLD, -1); Jeff SquyresSent by: users-boun...@open-mpi.org 12/06/2007 07:44 AM Please respond to Open MPI Users To Open MPI Users cc Subject Re: [OMPI users] Using mtrace with openmpi segfaults I have not tried to use mtrace myself. But I can see how it would be problematic with OMPI's internal use of ptmalloc2. If you are not using InfiniBand or Myrinet over GM, you don't need OMPI to have an internal copy of ptmalloc2. You can disable OMPI's ptmalloc2 by configuring with: ./configure --without-memory-manager On Dec 3, 2007, at 6:23 PM, Jeffrey M Ceason wrote: > > Having trouble using mtrace with openmpi. Whenever I use the mtrace > call before or after MPI_Init the application terminates. This only > seems to happen using mpi. Is there a way to disable the open-mpi > memory wrappers? Is there known issues with users applications > using mallopts and the mallopts used by open-mpi? > > Machine is AMD64 Fedora Core 7 > [ceason@n01-044-0 minib]$ uname -a > Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 > x86_64 x86_64 GNU/Linux > [ceason@n01-044-0 minib]$ > > > Test source. > #include > #include > #include > #include > #include > #include > #include > > using namespace std; > > int main (int argc,char * argv[]) { > mtrace(); > MPI_Init(NULL,NULL); > >MPI_Finalize(); > } > > > [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test > [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test > mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 > exited on signal 8 (Floating point exception). > [ceason@n01-044-0 minib]$ > > > backtrace of core > > Core was generated by `trace_test'. > Program terminated with signal 8, Arithmetic exception. > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > (gdb) bt > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > #1 0x2b33169d71c2 in _int_free () from /lib64/libc.so.6 > #2 0x2b33169dab1c in free () from /lib64/libc.so.6 > #3 0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6 > #4 0x2b33157674f3 in free () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #5 0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6 > #6 0x2b33169b3088 in asprintf () from /lib64/libc.so.6 > #7 0x2b3315760c7d in opal_output_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #8 0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #9 0x2b331575f958 in opal_malloc_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/ > lib64/libopen-pal.so.0 > #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/ > lib64/libmpi.so.0 > #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/lib64/ > libmpi.so.0 > #14 0x00408397 in main () > (gdb) > > Shouldn't involve communications between machines but here is the IB > Info. > > [ceason@n01-044-0 minib]$ ibv_devinfo > hca_id: mlx4_0 > fw_ver: 2.2.000 > node_guid: 0002:c903::17d0 > sys_image_guid: 0002:c903::17d3 > vendor_id: 0x02c9 > vendor_part_id: 25418 > hw_ver: 0xA0 > board_id: MT_04A0110002 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 1 > port_lid: 8 > port_lmc: 0x00 > > port: 2 > state: PORT_DOWN (1) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > [ceason@n01-044-0 minib]$ > > [ceason@n01-044-0 minib]$ ulimit -l > unlimited > > > > > ompi_info -all output > > Open MPI: 1.2.4 >Open MPI SVN revision: r16187 > Open RTE: 1.2.4 >Open RTE SVN revision: r16187 > OPAL: 1.2.4 >OPAL SVN revision: r16187 >MCA backtrace: execinfo (MCA v1.0,
Re: [OMPI users] Using mtrace with openmpi segfaults
Hi, Is it possible to disable ptmalloc2 at runtime by disabling the component? Thanks, Sajjad Tabib Jeff SquyresSent by: users-boun...@open-mpi.org 12/06/07 07:44 AM Please respond to Open MPI Users To Open MPI Users cc Subject Re: [OMPI users] Using mtrace with openmpi segfaults I have not tried to use mtrace myself. But I can see how it would be problematic with OMPI's internal use of ptmalloc2. If you are not using InfiniBand or Myrinet over GM, you don't need OMPI to have an internal copy of ptmalloc2. You can disable OMPI's ptmalloc2 by configuring with: ./configure --without-memory-manager On Dec 3, 2007, at 6:23 PM, Jeffrey M Ceason wrote: > > Having trouble using mtrace with openmpi. Whenever I use the mtrace > call before or after MPI_Init the application terminates. This only > seems to happen using mpi. Is there a way to disable the open-mpi > memory wrappers? Is there known issues with users applications > using mallopts and the mallopts used by open-mpi? > > Machine is AMD64 Fedora Core 7 > [ceason@n01-044-0 minib]$ uname -a > Linux n01-044-0 2.6.22-rr #1 SMP Fri Nov 16 15:28:53 CST 2007 x86_64 > x86_64 x86_64 GNU/Linux > [ceason@n01-044-0 minib]$ > > > Test source. > #include > #include > #include > #include > #include > #include > #include > > using namespace std; > > int main (int argc,char * argv[]) { > mtrace(); > MPI_Init(NULL,NULL); > >MPI_Finalize(); > } > > > [ceason@n01-044-0 minib]$ mpiCC dacs_test.cc -o trace_test > [ceason@n01-044-0 minib]$ mpirun -np 1 trace_test > mpirun noticed that job rank 0 with PID 7078 on node n01-044-0 > exited on signal 8 (Floating point exception). > [ceason@n01-044-0 minib]$ > > > backtrace of core > > Core was generated by `trace_test'. > Program terminated with signal 8, Arithmetic exception. > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > (gdb) bt > #0 0x2b33169d4abc in sYSTRIm () from /lib64/libc.so.6 > #1 0x2b33169d71c2 in _int_free () from /lib64/libc.so.6 > #2 0x2b33169dab1c in free () from /lib64/libc.so.6 > #3 0x2b33169dcee8 in tr_freehook () from /lib64/libc.so.6 > #4 0x2b33157674f3 in free () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #5 0x2b33169ceaf1 in vasprintf () from /lib64/libc.so.6 > #6 0x2b33169b3088 in asprintf () from /lib64/libc.so.6 > #7 0x2b3315760c7d in opal_output_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #8 0x2b3315760a2a in do_open () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #9 0x2b331575f958 in opal_malloc_init () from /usr/local/ > openmpi/lib64/libopen-pal.so.0 > #10 0x2b331574ac27 in opal_init_util () from /usr/local/openmpi/ > lib64/libopen-pal.so.0 > #11 0x2b331574ad06 in opal_init () from /usr/local/openmpi/lib64/ > libopen-pal.so.0 > #12 0x2b3315283edf in ompi_mpi_init () from /usr/local/openmpi/ > lib64/libmpi.so.0 > #13 0x2b33152a54f0 in PMPI_Init () from /usr/local/openmpi/lib64/ > libmpi.so.0 > #14 0x00408397 in main () > (gdb) > > Shouldn't involve communications between machines but here is the IB > Info. > > [ceason@n01-044-0 minib]$ ibv_devinfo > hca_id: mlx4_0 > fw_ver: 2.2.000 > node_guid: 0002:c903::17d0 > sys_image_guid: 0002:c903::17d3 > vendor_id: 0x02c9 > vendor_part_id: 25418 > hw_ver: 0xA0 > board_id: MT_04A0110002 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 1 > port_lid: 8 > port_lmc: 0x00 > > port: 2 > state: PORT_DOWN (1) > max_mtu:2048 (4) > active_mtu: 2048 (4) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 > > [ceason@n01-044-0 minib]$ > > [ceason@n01-044-0 minib]$ ulimit -l > unlimited > > > > > ompi_info -all output > > Open MPI: 1.2.4 >Open MPI SVN revision: r16187 > Open RTE: 1.2.4 >Open RTE SVN revision: r16187 > OPAL: 1.2.4 >OPAL SVN revision: r16187 >MCA backtrace: execinfo (MCA v1.0, API v1.0, Component > v1.2.4) > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component > v1.2.4) >MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.4) >
Re: [OMPI users] OpenMP + OpenMPI
On Dec 6, 2007, at 9:54 AM, Durga Choudhury wrote: Automatically striping large messages across multiple NICs is certainly a very nice feature; I was not aware that OpenMPI does this transparently. (I wonder if other MPI implementations do this or not). However, I have the following concern: Since the communication over an ethernet NIC is most likely over IP, does it take into account the route cost when striping messages? For example, host A and B in the MPD ring might be connected via two NICs, one direct and one via an intermediate router, or one with a large bandwidth and another with a small bandwidth. Does OpenMPI send a smaller chunk of data over a route with a higher cost? Not unless you tell it. In IB networks, the network API exposes bandwidth differences of the NIC and Open MPI takes that into account by deciding how much data to send down each endpoint. Open MPI does not currently know anything about / try to optimize based on the costs of different routes. On a TCP network, whether you go through 2 or 3 switches -- does it really matter? The latency is so high that adding another switch (or 2 or 3 or ...) may not make much of a difference anyway. Raw bandwidth differences between two networks will make a difference, but number of hops -- as long as they're not *too* difference -- might not. Also consider: if you're combining 100Mbps and 1Gbps ethernet networks -- is it really worth it? If your goal is simple bandwidth addition, note that you're adding a fraction of the capability to the 1Gbps network at the cost of additional complexity in your software and/or fragmentation reassembly penalties. Will you really see more delivered bandwidth? It's probably dependent upon your application (e.g., are you continually sending very large messages?). You might get much more bang for your buck if you combine like networks (e.g., 2x100Mbps or 2x1Gbps) because you'll be [potentially] doubling your bandwidth. Because of this concern, I think the channel bonding approach someone else suggested is more preferable; all these details will be taken care of at the hardware level instead of at the IP level. That's not quite true. Both approaches are handled in software; one is in the kernel, the other is in the middleware. The hardware is unaware that you are striping large messages. -- Jeff Squyres Cisco Systems
Re: [OMPI users] suggested intel compiler version for openmpi-1.2.4
> -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Jeff Squyres > > If Intel is telling you that they don't support your glibc version, I > wouldn't be surprised by random segv's in any application that you > build on that platform (including Open MPI). The install script of the 10.1.008 suite lists the supported platform. That includes kernel 2.6 and glibc 2.6. I guess there are some loose ends. -- Valmor > > > > On Dec 5, 2007, at 9:59 AM, de Almeida, Valmor F. wrote: > > > > > > > Attached is the config.log file for intel 9.1.052. I've also tried the > > intel 10.1.008 and posted to this list the config.log on a previous > > e-mail (12/2/2007). > > > > Here are the key commands for building and installing > > > > export FC=ifort > > export F77=ifort > > export CXX=icpc > > export CC=icc > > > > ./configure --prefix=/usr/local/packages/openmpi-1.2.4_intel-9.1.052 > > --with-mpi-param_check=always --with-mpi-f90-size=medium > > --with-f90-max-array-dim=4 CC=icc CXX=icpc F90=ifort F77=ifort > > > > make -j 2 > > make -j 2 install > > > > Here is some info on my system > > > > ->emerge --info > > Portage 2.1.3.19 (default-linux/x86/2007.0, gcc-4.1.2, glibc-2.6.1-r0, > > 2.6.22.9 i686) > > = > > System uname: 2.6.22.9 i686 Intel(R) Xeon(TM) CPU 2.66GHz > > Timestamp of tree: Sat, 17 Nov 2007 04:30:01 + > > app-shells/bash: 3.2_p17 > > dev-java/java-config: 1.3.7, 2.0.33-r1 > > dev-lang/python: 2.4.4-r6 > > dev-python/pycrypto: 2.0.1-r6 > > sys-apps/baselayout: 1.12.9-r2 > > sys-apps/sandbox:1.2.18.1-r2 > > sys-devel/autoconf: 2.13, 2.61-r1 > > sys-devel/automake: 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 > > sys-devel/binutils: 2.18-r1 > > sys-devel/gcc-config: 1.3.16 > > sys-devel/libtool: 1.5.24 > > virtual/os-headers: 2.6.22-r2 > > > > Here is > > > > ->echo $PATH > > /opt/cubit:/opt/gambit/bin:/opt/compilers/intel/idb/9.1.052/bin:/opt/ > > com > > pilers/intel/fc/9.1.052/bin:/opt/compilers/intel/cc/9.1.052/bin:/usr/ > > loc > > al/nwchem/bin:/usr/local/visit/bin:/usr/local/ompi_intel/bin:/usr/ > > local/ > > sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/i686- > > pc- > > linux-gnu/gcc-bin/4.1.2:/usr/local/packages/cca- > > tools-0.6.4_gcc-4.1.2/bi > > n:/usr/bin:/opt/chpl/bin/linux > > > > Here is > > > > ->echo $LD_LIBRARY_PATH > > /opt/compilers/intel/fc/9.1.052/lib:/opt/compilers/intel/cc/9.1.052/ > > lib: > > /usr/local/ompi_intel/lib::/usr/local/packages/cca- > > tools-0.6.4_gcc-4.1.2 > > /lib > > > > > > I am also copying this to the person at Intel who is looking at the > > problem, and here is a posting from the Intel Premier Support (Issue > > number: 461117) for this case > > > > > > 12/04/2007 > > Valmor, > > Thanks for your submission. The Intel compilers have supported the 2.6 > > Linux kernel for some time now, but they do not yet support glibc 2.6. > > The most recent version of glibc supported is 2.5, as represented by > > the > > Red Hat Enterprise Linux 5 and Ubuntu 7.04 distributions. There are > > known issues with Ubuntu 7.10, for example, which has a later glibc. > > What version of Gentoo are you using, and do you have the option to > > try > > an older glibc and gcc along with the Intel compiler version 10.1? > > > > It's not sure that the glibc version is the problem, so I will try to > > build OpenMPI with glibc 2.5 on Red Hat EL5 or Ubuntu 7.04 to see if > > that shows any problems. > > > > Regards, > > Martyn > > > > 12/05/2007 > > Martyn, > > > > During the installation of 10.1.008, the install script says that > > glibc > > 2.6 is supported. > > > > -- > > Valmor > > > > > > Thanks for your help. > > > > -- > > Valmor > > > > > > > > > > > > > > > > > >> -Original Message- > >> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] > > On > >> Behalf Of Jeff Squyres > >> Sent: Tuesday, December 04, 2007 5:58 PM > >> To: Open MPI Users > >> Subject: Re: [OMPI users] suggested intel compiler version for > > openmpi- > >> 1.2.4 > >> > >> I have compiled Open MPI with Intel 10.0 and 9.1 with no problems on > >> RHEL4U4. > >> > >> Can you send all the info that you can (obviously, ompi_info won't > >> run) from http://www.open-mpi.org/community/help/ ? > >> > >> > >> > >> On Dec 4, 2007, at 4:26 PM, de Almeida, Valmor F. wrote: > >> > >>> > >>> Hello, > >>> > >>> What is the suggested intel compiler version to compile > > openmpi-1.2.4? > >>> > >>> I tried versions 10.1.008 and 9.1.052 and no luck in getting a > > working > >>> library. In both cases I get: > >>> > >>> ->mpic++ --showme > >>> Segmentation fault > >>> > >>> ->ompi_info > >>> Segmentation fault > >>> > >>> Thanks for your help. > >>> > >>> -- > >>> Valmor de Almeida > >>> > >>> > >>> > >>> > >>> > >>> ___ > >>> users mailing list > >>> us...@open-mpi.org > >>>
Re: [OMPI users] OpenMP + OpenMPI
Automatically striping large messages across multiple NICs is certainly a very nice feature; I was not aware that OpenMPI does this transparently. (I wonder if other MPI implementations do this or not). However, I have the following concern: Since the communication over an ethernet NIC is most likely over IP, does it take into account the route cost when striping messages? For example, host A and B in the MPD ring might be connected via two NICs, one direct and one via an intermediate router, or one with a large bandwidth and another with a small bandwidth. Does OpenMPI send a smaller chunk of data over a route with a higher cost? Because of this concern, I think the channel bonding approach someone else suggested is more preferable; all these details will be taken care of at the hardware level instead of at the IP level. Thanks Durga On Dec 6, 2007 9:42 AM, Jeff Squyreswrote: > Wow, that's quite a .sig. :-) > > Open MPI will automatically stripe large messages across however many > NICs you have. So you shouldn't need to use multiple threads. > > The threading support in the OMPI v1.2 series is broken; it's not > worth using. There's a big warning in configure when you enable it. :-) > > > On Dec 5, 2007, at 9:57 PM, Tee Wen Kai wrote: > > > Hi everyone, > > > > I have installed openmpi-1.2.3. My system has two ethernet ports. > > Thus, I am trying to make use of both ports to speed up the > > communication process by using openmp to split into two threads. > > However, this implementation always cause error. Then I realized > > that I need to build openmpi using --enable-mpi-threads and use > > MPI_Init_thread to initialize. But, the initialization always return > > MPI_THREAD_SINGLE no matter what value I set. Using ompi_info|grep > > Thread, it shows that thread support has already been activated. > > Thus, I seek your help to teach me what other configurations I need > > to set in order to use multi-threads and what are the parameters to > > include in mpirun in order to use the two ethernet ports. > > > > Thank you very much. > > > > Regards, > > Tee > > > > > > > > _ > > > > > > > > Many of us spend our time wishing for things we could have if we > > didn't spend half our time wishing. > > > > Looking for last minute shopping deals? Find them fast with Yahoo! > > Search.___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > Cisco Systems > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Its a battle between humans and communists; Which side are you in? .
Re: [OMPI users] OpenMP + OpenMPI
Wow, that's quite a .sig. :-) Open MPI will automatically stripe large messages across however many NICs you have. So you shouldn't need to use multiple threads. The threading support in the OMPI v1.2 series is broken; it's not worth using. There's a big warning in configure when you enable it. :-) On Dec 5, 2007, at 9:57 PM, Tee Wen Kai wrote: Hi everyone, I have installed openmpi-1.2.3. My system has two ethernet ports. Thus, I am trying to make use of both ports to speed up the communication process by using openmp to split into two threads. However, this implementation always cause error. Then I realized that I need to build openmpi using --enable-mpi-threads and use MPI_Init_thread to initialize. But, the initialization always return MPI_THREAD_SINGLE no matter what value I set. Using ompi_info|grep Thread, it shows that thread support has already been activated. Thus, I seek your help to teach me what other configurations I need to set in order to use multi-threads and what are the parameters to include in mpirun in order to use the two ethernet ports. Thank you very much. Regards, Tee _ Many of us spend our time wishing for things we could have if we didn't spend half our time wishing. Looking for last minute shopping deals? Find them fast with Yahoo! Search.___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Q: Problems launching MPMD applications? ('mca_oob_tcp_peer_try_connect' error 103)
On 12/5/07 8:47 AM, "Brian Dobbins"wrote: > Hi Josh, > >> I believe the problem is that you are only applying the MCA >> parameters to the first app context instead of all of them: > > Thank you very much.. applying the parameters with -gmca works fine with the > test case (and I'll try the actual one soon). However and this is minor > since it works with method (1),... > >> There are two main ways of doing this: >> 2) Alternatively you can duplicate the MCA parameters for each app context: > > .. This actually doesn't work. I had thought of that and tried it, and I > still get the same connection problems. I just rechecked this again to be > sure. That is correct - the root problem here is that the command line MCA params are not propagated to the remote daemons when we launch in 1.2. So launch of the remote daemons fails as they are not looking at the correct interface to link themselves into the system. The apps themselves would have launched okay given the duplicate MCA params as we store the params for each app_context and pass them along when the daemon spawns them - you just can't get them launched because the daemons fail first. The aggregated MCA params flow through a different mechanism altogether, which is why they work. We have fixed this on our development trunk so the command line params get passed - should work fine in future releases. Ralph > > Again, many thanks for the help! > > With best wishes, > - Brian > > > Brian Dobbins > Yale University HPC > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] suggested intel compiler version for openmpi-1.2.4
If Intel is telling you that they don't support your glibc version, I wouldn't be surprised by random segv's in any application that you build on that platform (including Open MPI). On Dec 5, 2007, at 9:59 AM, de Almeida, Valmor F. wrote: Attached is the config.log file for intel 9.1.052. I've also tried the intel 10.1.008 and posted to this list the config.log on a previous e-mail (12/2/2007). Here are the key commands for building and installing export FC=ifort export F77=ifort export CXX=icpc export CC=icc ./configure --prefix=/usr/local/packages/openmpi-1.2.4_intel-9.1.052 --with-mpi-param_check=always --with-mpi-f90-size=medium --with-f90-max-array-dim=4 CC=icc CXX=icpc F90=ifort F77=ifort make -j 2 make -j 2 install Here is some info on my system ->emerge --info Portage 2.1.3.19 (default-linux/x86/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.22.9 i686) = System uname: 2.6.22.9 i686 Intel(R) Xeon(TM) CPU 2.66GHz Timestamp of tree: Sat, 17 Nov 2007 04:30:01 + app-shells/bash: 3.2_p17 dev-java/java-config: 1.3.7, 2.0.33-r1 dev-lang/python: 2.4.4-r6 dev-python/pycrypto: 2.0.1-r6 sys-apps/baselayout: 1.12.9-r2 sys-apps/sandbox:1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.61-r1 sys-devel/automake: 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.18-r1 sys-devel/gcc-config: 1.3.16 sys-devel/libtool: 1.5.24 virtual/os-headers: 2.6.22-r2 Here is ->echo $PATH /opt/cubit:/opt/gambit/bin:/opt/compilers/intel/idb/9.1.052/bin:/opt/ com pilers/intel/fc/9.1.052/bin:/opt/compilers/intel/cc/9.1.052/bin:/usr/ loc al/nwchem/bin:/usr/local/visit/bin:/usr/local/ompi_intel/bin:/usr/ local/ sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/i686- pc- linux-gnu/gcc-bin/4.1.2:/usr/local/packages/cca- tools-0.6.4_gcc-4.1.2/bi n:/usr/bin:/opt/chpl/bin/linux Here is ->echo $LD_LIBRARY_PATH /opt/compilers/intel/fc/9.1.052/lib:/opt/compilers/intel/cc/9.1.052/ lib: /usr/local/ompi_intel/lib::/usr/local/packages/cca- tools-0.6.4_gcc-4.1.2 /lib I am also copying this to the person at Intel who is looking at the problem, and here is a posting from the Intel Premier Support (Issue number: 461117) for this case 12/04/2007 Valmor, Thanks for your submission. The Intel compilers have supported the 2.6 Linux kernel for some time now, but they do not yet support glibc 2.6. The most recent version of glibc supported is 2.5, as represented by the Red Hat Enterprise Linux 5 and Ubuntu 7.04 distributions. There are known issues with Ubuntu 7.10, for example, which has a later glibc. What version of Gentoo are you using, and do you have the option to try an older glibc and gcc along with the Intel compiler version 10.1? It's not sure that the glibc version is the problem, so I will try to build OpenMPI with glibc 2.5 on Red Hat EL5 or Ubuntu 7.04 to see if that shows any problems. Regards, Martyn 12/05/2007 Martyn, During the installation of 10.1.008, the install script says that glibc 2.6 is supported. -- Valmor Thanks for your help. -- Valmor -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Tuesday, December 04, 2007 5:58 PM To: Open MPI Users Subject: Re: [OMPI users] suggested intel compiler version for openmpi- 1.2.4 I have compiled Open MPI with Intel 10.0 and 9.1 with no problems on RHEL4U4. Can you send all the info that you can (obviously, ompi_info won't run) from http://www.open-mpi.org/community/help/ ? On Dec 4, 2007, at 4:26 PM, de Almeida, Valmor F. wrote: Hello, What is the suggested intel compiler version to compile openmpi-1.2.4? I tried versions 10.1.008 and 9.1.052 and no luck in getting a working library. In both cases I get: ->mpic++ --showme Segmentation fault ->ompi_info Segmentation fault Thanks for your help. -- Valmor de Almeida ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [MTT users] [OMPI devel] Using MTT to test the newly added SCTP BTL
On Dec 5, 2007, at 1:42 PM, Karol Mroz wrote: Removal of .ompi_ignore should not create build problems for anyone who is running without some form of SCTP support. To test this claim, we built Open MPI with .ompi_ignore removed and no SCTP support on both an ubuntu linux and an OSX machine. Both builds succeeded without any problem. In light of the above, are there any objections to us removing the .ompi_ignore file from the SCTP BTL code? Thanks for your persistence on this. :-) I think that since no one has objected, you should feel free to do so. I tried to work around this problem by using a pre-installed version of Open MPI to run MTT tests on (ibm tests initially) but all I get is a short summary from MTT that things succeeded, instead of a detailed list of specific test successes/failures as is shown when using a nightly tarball. MTT has several different reporters; the default "file" reporter simply outputs a summary to stdout upon completion. The intention is that the file reporter would be used by developers for quick/ interactive tests to verify that you hadn't broken anything; more details are available in the meta data files in the scratch tree if you know where to look. We intended that MTT's database reporter would usually be used for common testing, etc. The web interface is [by far] the easiest way to drill down in the results to see the details of what you need to know about individual failures, etc. The 'tests' also complete much faster which sparks some concern as to whether they were actually run. If you just manually add the sctp btl directory to an existing tarball, I'm pretty sure that it won't build. OMPI's build system is highly dependent upon its "autogen" procedure, which creates a hard- coded list of components to build. For a tarball, that procedure has already completed, and even if you add in more component directories after you expand the tarball, the hard-coded lists won't be updated, and therefore OMPI's configure/build system will skip them. Furthermore, MTT puts the source into a new 'random' directory prior to building (way around this?), No. The internal directory structure of the scratch tree, as you noted, uses random directory names. This is for two reasons: 1. because MTT can't know ahead of time what you are going to tell it to do 2. one obvious way to have non-random directory names is to use the names of the INI file sections as various directory levels. However, this creates Very, Very Long directory names in the scratch tree and some compilers have a problem with this (even though the total filenames are within the filesystem limit). Hence, we came up with the scheme of using short, random directory names that will guarantee that the total filename length is short. Note that for human convenience, MTT *also* puts in sym links to the short random directory names that correspond to the INI section names. So if a human needs to go into the scratch tree to investigate some failures, it should be pretty easy to navigate using the sym links (vs. the short/random names). so I can't add the SCTP directory by hand, and then run the build/installation phase. Adding the code on the fly during the installation phase also does not work. Any advice in this matter? Thanks again everyone. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Karol Mroz km...@cs.ubc.ca ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI users] Simple MPI_Comm_spawn program hangs
To add more info, here is a backtrace of the spawned (hung) program. (gdb) bt #0 0xe410 in __kernel_vsyscall () #1 0x402cdaec in sched_yield () from /lib/tls/libc.so.6 #2 0x4016360c in opal_progress () at runtime/opal_progress.c:301 #3 0x403a9b29 in mca_oob_tcp_msg_wait (msg=0x805cc70, rc=0xbfffba40) at oob_tcp_msg.c:108 #4 0x403b09a5 in mca_oob_tcp_recv (peer=0xbfffbba8, iov=0xbfffba88, count=1, tag=0, flags=4) at oob_tcp_recv.c:138 #5 0x40119420 in mca_oob_recv_packed (peer=0xbfffbba8, buf=0x821b200, tag=0) at base/oob_base_recv.c:69 #6 0x4003c28b in ompi_comm_allreduce_intra_oob (inbuf=0xbfffbb48, outbuf=0xbfffbb44, count=1, op=0x400d14a0, comm=0x8049d38, bridgecomm=0x0, lleader=0xbfffbc04, rleader=0xbfffbba8, send_first=1) at communicator/comm_cid.c:674 #7 0x4003adf2 in ompi_comm_nextcid (newcomm=0x807c4f8, comm=0x8049d38, bridgecomm=0x0, local_leader=0xbfffbc04, remote_leader=0xbfffbba8, mode=256, send_first=1) at communicator/ comm_cid.c:176 #8 0x4003cc2c in ompi_comm_connect_accept (comm=0x8049d38, root=0, port=0x807a5c0, send_first=1, newcomm=0xbfffbc28, tag=2000) at communicator/comm_dyn.c:208 #9 0x4003ec97 in ompi_comm_dyn_init () at communicator/comm_dyn.c:668 #10 0x4005465a in ompi_mpi_init (argc=1, argv=0xbfffbf64, requested=0, provided=0xbfffbd14) at runtime/ompi_mpi_init.c:704 #11 0x40090367 in PMPI_Init (argc=0xbfffbee0, argv=0xbfffbee4) at pinit.c:71 #12 0x08048983 in main (argc=1, argv=0xbfffbf64) at slave.c:43 (gdb) Prakash On Dec 6, 2007, at 12:08 AM, Prakash Velayutham wrote: Hi Edgar, I changed the spawned program from /bin/hostname to a very simple MPI program as below. But now, the slave hangs right at MPI_Init line. What could the issue be? slave.c #include #include #include #include "mpi.h" #include /* standard system types */ #include /* Internet address structures */ #include /* socket interface functions */ #include /* host to IP resolution */ int gdb_var; void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; MPI_Status status; MPI_Comminter_comm; gdb_var = 0; char hostname[64]; FILE *f; while (0 == gdb_var) sleep(5); gethostname(hostname, 64); MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); MPI_Comm_get_parent(_comm); MPI_Finalize(); exit(0); } Thanks, Prakash On Dec 2, 2007, at 8:36 PM, Edgar Gabriel wrote: MPI_Comm_spawn is tested nightly by the test our suites, so it should definitely work... Thanks Edgar Prakash Velayutham wrote: Thanks Edgar. I did not know that. Really? Anyways, you are sure, an MPI job will work as a spawned process instead of "hostname"? Thanks, Prakash On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote: MPI_Comm_spawn has to build an intercommunicator with the child process that it spawns. Thus, you can not spawn a non-MPI job such as /bin/hostname, since the parent process waits for some messages from the child process(es) in order to set up the intercommunicator. Thanks Edgar Prakash Velayutham wrote: Hello, Open MPI 1.2.4 I am trying to run a simple C program. ## #include #include #include #include "mpi.h" void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; charmessage_0[] = "hello slave, i'm your master"; charmessage_1[50]; charmaster_data[] = "slaves to work"; int array_of_errcodes[10]; int num; MPI_Status status; MPI_Comminter_comm; MPI_Infoinfo; int arr[1]; int rc1; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); printf("MASTER : spawning a slave ... \n"); rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr); MPI_Finalize(); exit(0); } ## This program hangs as below: prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1 MASTER : spawning a slave ... bmi-xeon1-01 Any ideas why? Thanks, Prakash ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
Re: [OMPI users] Simple MPI_Comm_spawn program hangs
Hi Edgar, I changed the spawned program from /bin/hostname to a very simple MPI program as below. But now, the slave hangs right at MPI_Init line. What could the issue be? slave.c #include #include #include #include "mpi.h" #include /* standard system types */ #include /* Internet address structures */ #include /* socket interface functions */ #include /* host to IP resolution */ int gdb_var; void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; MPI_Status status; MPI_Comminter_comm; gdb_var = 0; char hostname[64]; FILE *f; while (0 == gdb_var) sleep(5); gethostname(hostname, 64); MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); MPI_Comm_get_parent(_comm); MPI_Finalize(); exit(0); } Thanks, Prakash On Dec 2, 2007, at 8:36 PM, Edgar Gabriel wrote: MPI_Comm_spawn is tested nightly by the test our suites, so it should definitely work... Thanks Edgar Prakash Velayutham wrote: Thanks Edgar. I did not know that. Really? Anyways, you are sure, an MPI job will work as a spawned process instead of "hostname"? Thanks, Prakash On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote: MPI_Comm_spawn has to build an intercommunicator with the child process that it spawns. Thus, you can not spawn a non-MPI job such as /bin/hostname, since the parent process waits for some messages from the child process(es) in order to set up the intercommunicator. Thanks Edgar Prakash Velayutham wrote: Hello, Open MPI 1.2.4 I am trying to run a simple C program. ## #include #include #include #include "mpi.h" void main(int argc, char **argv) { int tag = 0; int my_rank; int num_proc; charmessage_0[] = "hello slave, i'm your master"; charmessage_1[50]; charmaster_data[] = "slaves to work"; int array_of_errcodes[10]; int num; MPI_Status status; MPI_Comminter_comm; MPI_Infoinfo; int arr[1]; int rc1; MPI_Init(, ); MPI_Comm_rank(MPI_COMM_WORLD, _rank); MPI_Comm_size(MPI_COMM_WORLD, _proc); printf("MASTER : spawning a slave ... \n"); rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr); MPI_Finalize(); exit(0); } ## This program hangs as below: prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1 MASTER : spawning a slave ... bmi-xeon1-01 Any ideas why? Thanks, Prakash ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Edgar Gabriel Assistant Professor Parallel Software Technologies Lab http://pstl.cs.uh.edu Department of Computer Science University of Houston Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users