[OMPI users] compilation error using pgf90 ver 9.0
Dear opem-mpi users: I have got the following error while compiling openmpi using pgf90 ver 9 and CC=gcc How I can run make and avoiding the -pthread flag ?. pgf90-Error-Unknown switch: -pthread make[4]: *** [libmpi_f90.la] Error 1 make[4]: Leaving directory `/home/mohamed/bin/openmpi-1.4.1/build/ompi/mpi/f90' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/home/mohamed/bin/openmpi-1.4.1/build/ompi/mpi/f90' make[2]: *** [all] Error 2 make[2]: Leaving directory `/home/mohamed/bin/openmpi-1.4.1/build/ompi/mpi/f90' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/mohamed/bin/openmpi-1.4.1/build/ompi' make: *** [all-recursive] Error 1 I appreciate if any one can help me. Best Regard M. Makhyoun
[OMPI users] Problem including C MPI code from C++ using C linkage
Hi all, I'm have a C MPI code that I need to link into my C++ code. As usual, from my C++ code, I do extern "C" { #include "c-code.h" } where c-code.h includes, among other things, mpi.h. This doesn't work, because it appears mpi.h tries to detect whether it's being compiled as C or C++ and includes mpicxx.h if the language is C++. The problem is that that doesn't work in C linkage, so the compilation dies with errors like: mpic++ -I. -I$HOME/include/libPJutil -I$HOME/code/arepo -m32 arepotest.cc -I$HOME/include -I/sw/include -L/sw/lib -L$HOME/code/arepo -larepo -lhdf5 -lgsl -lgmp -lmpi In file included from /usr/include/c++/4.2.1/map:65, from /sw/include/openmpi/ompi/mpi/cxx/mpicxx.h:36, from /sw/include/mpi.h:1886, from /Users/patrik/code/arepo/allvars.h:23, from /Users/patrik/code/arepo/proto.h:2, from arepo_grid.h:36, from arepotest.cc:3: /usr/include/c++/4.2.1/bits/stl_tree.h:134: error: template with C linkage /usr/include/c++/4.2.1/bits/stl_tree.h:145: error: declaration of C function 'const std::_Rb_tree_node_base* std::_Rb_tree_increment(const std::_Rb_tree_node_base*)' conflicts with /usr/include/c++/4.2.1/bits/stl_tree.h:142: error: previous declaration 'std::_Rb_tree_node_base* std::_Rb_tree_increment(std::_Rb_tree_node_base*)' here /usr/include/c++/4.2.1/bits/stl_tree.h:151: error: declaration of C function 'const std::_Rb_tree_node_base* std::_Rb_tree_decrement(const std::_Rb_tree_node_base*)' conflicts with /usr/include/c++/4.2.1/bits/stl_tree.h:148: error: previous declaration 'std::_Rb_tree_node_base* std::_Rb_tree_decrement(std::_Rb_tree_node_base*)' here /usr/include/c++/4.2.1/bits/stl_tree.h:153: error: template with C linkage /usr/include/c++/4.2.1/bits/stl_tree.h:223: error: template with C linkage /usr/include/c++/4.2.1/bits/stl_tree.h:298: error: template with C linkage /usr/include/c++/4.2.1/bits/stl_tree.h:304: error: template with C linkage /usr/include/c++/4.2.1/bits/stl_tree.h:329: error: template with C linkage etc. etc. It seems a bit presumptuous of mpi.h to just include mpicxx.h just because __cplusplus is defined, since that makes it impossible to link C MPI code from C++. I've had to resort to something like #ifdef __cplusplus #undef __cplusplus #include #define __cplusplus #else #include #endif in c-code.h, which seems to work but isn't exactly smooth. Is there another way around this, or has linking C MPI code with C++ never come up before? Thanks, /Patrik Jonsson
Re: [OMPI users] Fwd: Problems with OpenMPI
Osvaldo These FAQ may help: http://www.open-mpi.org/faq/?category=rsh Also, make sure the same OpenMPI is either installed on, or accessible by (say via NFS) the "anotherhost". Simple test: 'mpirun -np 8 --host [localhost,anotherhost] hostname' IHIH Gus Correa David Zhang wrote: Check firewall, network setting like subnet, and ssh keys? On 8/31/10, Osvaldo Reiswrote: Hi, I wanted to use openmpi, I install with no errors, and when I run de examples in local it work well, but when I specify another host to run it doesn't work. There are no errors but the it doesn't show anything and don't start the process in another host and don't stop to run in localhost. Then I press Crtl+C to killl the process and it shows: ^Cmpirun: killing job... -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- -- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -- anotherhost - daemon did not report back when launched I loot at /var/log/secure and it started the connection but after he close without execute anything. Running localhost [user@host1 examples]$ mpirun -np 8 --host localhost ./hello_c Hello, world, I am 0 of 8 Hello, world, I am 1 of 8 Hello, world, I am 2 of 8 Hello, world, I am 4 of 8 Hello, world, I am 5 of 8 Hello, world, I am 6 of 8 Hello, world, I am 7 of 8 Hello, world, I am 3 of 8 Running in another host [user@host1 examples]$ mpirun -np 8 --host anotherhost ./hello_c Some help please! -- Osvaldo Reis Junior Engenharia de Computação - UEPG Laboratório de Genômica e Expressão - LGE Universidade Estadual de Campinas - UNICAMP MSN: osvaldorei...@hotmail.com Skype: osvaldoreisss Cel: (19) 8128-5273
Re: [OMPI users] Fwd: Problems with OpenMPI
Ty David, the problem was my firewall. The server machine is new and who install the OS forgot to desable the iptables. Thanks 2010/8/31 David Zhang> Check firewall, network setting like subnet, and ssh keys? > > On 8/31/10, Osvaldo Reis wrote: > > Hi, I wanted to use openmpi, I install with no errors, and when I run de > > examples in local it work well, but when I specify another host to run it > > doesn't work. There are no errors but the it doesn't show anything and > don't > > start the process in another host and don't stop to run in localhost. > Then I > > press Crtl+C to killl the process and it shows: > > > > ^Cmpirun: killing job... > > > > > -- > > mpirun noticed that the job aborted, but has no info as to the process > > that caused that situation. > > > -- > > > -- > > mpirun was unable to cleanly terminate the daemons on the nodes shown > > below. Additional manual cleanup may be required - please refer to > > the "orte-clean" tool for assistance. > > > -- > > anotherhost - daemon did not report back when launched > > > > I loot at /var/log/secure and it started the connection but after he > close > > without execute anything. > > > > > > Running localhost > > > > [user@host1 examples]$ mpirun -np 8 --host localhost ./hello_c > > Hello, world, I am 0 of 8 > > Hello, world, I am 1 of 8 > > Hello, world, I am 2 of 8 > > Hello, world, I am 4 of 8 > > Hello, world, I am 5 of 8 > > Hello, world, I am 6 of 8 > > Hello, world, I am 7 of 8 > > Hello, world, I am 3 of 8 > > > > Running in another host > > > > [user@host1 examples]$ mpirun -np 8 --host anotherhost ./hello_c > > > > > > Some help please! > > > > > > > > -- > > Osvaldo Reis Junior > > Engenharia de Computação - UEPG > > Laboratório de Genômica e Expressão - LGE > > Universidade Estadual de Campinas - UNICAMP > > MSN: osvaldorei...@hotmail.com > > Skype: osvaldoreisss > > Cel: (19) 8128-5273 > > > > -- > Sent from my mobile device > > David Zhang > University of California, San Diego > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Osvaldo Reis Junior Engenharia de Computação - UEPG Laboratório de Genômica e Expressão - LGE Universidade Estadual de Campinas - UNICAMP MSN: osvaldorei...@hotmail.com Skype: osvaldoreisss Cel: (19) 8128-5273
Re: [OMPI users] Fwd: Problems with OpenMPI
Check firewall, network setting like subnet, and ssh keys? On 8/31/10, Osvaldo Reiswrote: > Hi, I wanted to use openmpi, I install with no errors, and when I run de > examples in local it work well, but when I specify another host to run it > doesn't work. There are no errors but the it doesn't show anything and don't > start the process in another host and don't stop to run in localhost. Then I > press Crtl+C to killl the process and it shows: > > ^Cmpirun: killing job... > > -- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -- > -- > mpirun was unable to cleanly terminate the daemons on the nodes shown > below. Additional manual cleanup may be required - please refer to > the "orte-clean" tool for assistance. > -- > anotherhost - daemon did not report back when launched > > I loot at /var/log/secure and it started the connection but after he close > without execute anything. > > > Running localhost > > [user@host1 examples]$ mpirun -np 8 --host localhost ./hello_c > Hello, world, I am 0 of 8 > Hello, world, I am 1 of 8 > Hello, world, I am 2 of 8 > Hello, world, I am 4 of 8 > Hello, world, I am 5 of 8 > Hello, world, I am 6 of 8 > Hello, world, I am 7 of 8 > Hello, world, I am 3 of 8 > > Running in another host > > [user@host1 examples]$ mpirun -np 8 --host anotherhost ./hello_c > > > Some help please! > > > > -- > Osvaldo Reis Junior > Engenharia de Computação - UEPG > Laboratório de Genômica e Expressão - LGE > Universidade Estadual de Campinas - UNICAMP > MSN: osvaldorei...@hotmail.com > Skype: osvaldoreisss > Cel: (19) 8128-5273 > -- Sent from my mobile device David Zhang University of California, San Diego
[OMPI users] Fwd: Problems with OpenMPI
Hi, I wanted to use openmpi, I install with no errors, and when I run de examples in local it work well, but when I specify another host to run it doesn't work. There are no errors but the it doesn't show anything and don't start the process in another host and don't stop to run in localhost. Then I press Crtl+C to killl the process and it shows: ^Cmpirun: killing job... -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- -- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. -- anotherhost - daemon did not report back when launched I loot at /var/log/secure and it started the connection but after he close without execute anything. Running localhost [user@host1 examples]$ mpirun -np 8 --host localhost ./hello_c Hello, world, I am 0 of 8 Hello, world, I am 1 of 8 Hello, world, I am 2 of 8 Hello, world, I am 4 of 8 Hello, world, I am 5 of 8 Hello, world, I am 6 of 8 Hello, world, I am 7 of 8 Hello, world, I am 3 of 8 Running in another host [user@host1 examples]$ mpirun -np 8 --host anotherhost ./hello_c Some help please! -- Osvaldo Reis Junior Engenharia de Computação - UEPG Laboratório de Genômica e Expressão - LGE Universidade Estadual de Campinas - UNICAMP MSN: osvaldorei...@hotmail.com Skype: osvaldoreisss Cel: (19) 8128-5273
[OMPI users] [R] Short survey concerning the use of software engineering in the field of High Performance Computing
Dear Colleagues, this is a short survey (21 questions that take about 10 minutes to answer) in context of the research work for my PhD thesis and the Munich Center of Advanced Computing (Project B2). It would be very helpful, if you will take the time to answer my questions concerning the use of software engineering in the field of High Performance Computing. Please note, all questions are mandatory to answer! http://www.q-set.de/q-set.php?sCode=TCSBHMPZAASZ Thank you very much, kind regards Miriam Schmidberger (Dipl. Medien-Inf.) schmi...@in.tum.de Technische Universität München Institut für Informatik Boltzmannstr. 3 85748 Garching Germany Office 01.07.037 Tel: +49 (89) 289-18226
Re: [OMPI users] Memory allocation error when linking with MPI libraries
Hi, Thanks Nysal for these details. I also fixed my memory allocation issue using environment variable OMPI_MCA_memory_ptmalloc2_disable which is much more easier (at least in my case) than compiled new openmpi package and install that new package. The point is that it is a bit complicated to have information about this variable (seems to be a secret variable !). Actually I have read that it cannot be used as normal MCA parameter and cannot be set in configuration file ( http://www.open-mpi.org/community/lists/users/2010/06/13208.php ). When using this variable, I have added -x OMPI_MCA_memory_ptmalloc2_disable option to my mpirun command line. Do I really have to do it ? Is the environment variable (plus -x option if required) is still the only solution to set this parameter to 1 ? Regards, Nicolas 2010/8/15 Nysal Jan> >What does it exactly imply to compile with this option ? > Open MPI's internal malloc library (ptmalloc) will not be built/used. If > you are using an RDMA capable interconnect such as Infiniband, you will not > be able to use the "mpi_leave_pinned" feature. mpi_leave_pinned might > improve performance for applications that reuse/repeatedly send from the > same buffer. If you are not using such interconnects then there is no impact > on performance. For more details see the FAQ entries (24-28) - > http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned > > --Nysal > > > > On Thu, Aug 12, 2010 at 6:30 PM, Nicolas Deladerriere < > nicolas.deladerri...@gmail.com> wrote: > >> building openmpi with option "--without-memory-manager" fix my problem. >> >> What does it exactly imply to compile with this option ? >> I guess all malloc use functions from libc instead of openmpi one, but >> does it have an effect on performance or something else ? >> >> Nicolas >> >> 2010/8/8 Nysal Jan >> >> What interconnect are you using? Infiniband? Use >>> "--without-memory-manager" option while building ompi in order to disable >>> ptmalloc. >>> >>> Regards >>> --Nysal >>> >>> >>> On Sun, Aug 8, 2010 at 7:49 PM, Nicolas Deladerriere < >>> nicolas.deladerri...@gmail.com> wrote: >>> Yes, I'am using 24G machine on 64 bit Linux OS. If I compile without wrapper, I did not get any problems. It seems that when I am linking with openmpi, my program use a kind of openmpi implemented malloc. Is it possible to switch it off in order ot only use malloc from libc ? Nicolas 2010/8/8 Terry Frankcombe You're trying to do a 6GB allocate. Can your underlying system handle > that? IF you compile without the wrapper, does it work? > > I see your executable is using the OMPI memory stuff. IIRC there are > switches to turn that off. > > > On Fri, 2010-08-06 at 15:05 +0200, Nicolas Deladerriere wrote: > > Hello, > > > > I'am having an sigsegv error when using simple program compiled and > > link with openmpi. > > I have reproduce the problem using really simple fortran code. It > > actually does not even use MPI, but just link with mpi shared > > libraries. (problem does not appear when I do not link with mpi > > libraries) > >% cat allocate.F90 > >program test > >implicit none > >integer, dimension(:), allocatable :: z > >integer(kind=8) :: l > > > >write(*,*) "l ?" > >read(*,*) l > > > >ALLOCATE(z(l)) > >z(1) = 111 > >z(l) = 222 > >DEALLOCATE(z) > > > >end program test > > > > I am using openmpi 1.4.2 and gfortran for my tests. Here is the > > compilation : > > > >% ./openmpi-1.4.2/build/bin/mpif90 --showme -g -o testallocate > > allocate.F90 > >gfortran -g -o testallocate allocate.F90 > > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/include -pthread > > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib > > -L/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib -lmpi_f90 > > -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl > > -lutil -lm -ldl -pthread > > > > When I am running that test with different length, I sometimes get a > > "Segmentation fault" error. Here are two examples using two specific > > values, but error happens for many other values of length (I did not > > manage to find which values of lenght gives that error) > > > >% ./testallocate > > l ? > >16 > >Segmentation fault > >% ./testallocate > > l ? > >20 > > > > I used debugger with re-compiled version of openmpi using debug flag. > > I got the folowing error in function sYSMALLOc > > > >Program received signal SIGSEGV, Segmentation fault. > >0x2b70b3b3 in sYSMALLOc (nb=640016, av=0x2b930200)
Re: [OMPI users] High Checkpoint Overhead Ratio
Have you tried testing without using the NFS? So setting the mca-params.conf to something like: crs_base_snapshot_dir=/tmp/ snapc_base_global_snapshot_dir=/tmp/global snapc_basee_store_in_place=0 This would remove the NFS time from the checkpoint time. However if you are using staging this may or may not reduce the application overhead significantly. If you want to save to NFS, and it is globally mounted you could try setting the 'snapc_base_global_shared' parameter (deprecated in the trunk) which tells the system to use standard UNIX copy commands (i.e., cp) instead of the rsh varieties. You might try changing the '--mca filem_rsh_max_incomming' parameter (default 10) to increase or decrease the number of concurrent rcp/scp operations. Something else to try is to look at the SnapC timing to pinpoint where the system is taking the most time: snapc_full_enable_timing=1 Dince you are using the C/R thread, it takes up some CPU cycles that may interfere with application performance. You can adjust the agressiveness of this thread by adjusting the 'opal_cr_thread_sleep_wait' parameter. In 1.5.0 it defaults to 0 microseconds, but on the trunk this has been adjusted to 1000 microseconds. Try setting the parameter: opal_cr_thread_sleep_wait=1000 Depending on how much memory is required by CG.C and available on each node, you may be hitting a memory barrier that BLCR is struggling to overcome. What happens if you reduce the number of processes per node? Those are some things to play around with to see what works best for your system and application. For a full list of parameters available in the C/R infrastructure see the link below: http://osl.iu.edu/research/ft/ompi-cr/api.php -- Josh On Aug 30, 2010, at 11:08 PM, 陈文浩 wrote: > Dear OMPI Users, > > I’m now using BLCR-0.8.2 and OpenMPI-1.5rc5. The problem is that it takes a > very long time to checkpoint. > > BLCR configuration: > ./onfigure --prefix=/opt/blcr --enable-static > OpenMPi configuration: > ./configure --prefix=/opt/ompi --with-ft=cr --with-blcr=/opt/blcr > --enable-static --enable-ft-thread --enable-mpi-threads > > Our blades use NFS. $HOME and /opt are shared. > > In $HOME/.opnempi/mca-params.conf: > crs_base_snapshot_dir=/tmp/ > snapc_base_global_snapshot_dir=/home/chenwh > snapc_basee_store_in_place=0 > > > Now I run CG NPB (NPROCS=16, CLASS=C) on two nodes (blade02, blade04). > With no checkpoint, 'Time in seconds' is about 100s. It's normal. > But when I take a single checkpoint, 'Time in seconds' is up to 300s. The > overhead ratio is over 200%! WHY? How can I improve it? > > blade02:~> ompi-checkpoint --status 27115 > [blade02:27130] [ 0.00 / 0.25] Requested - ... > [blade02:27130] [ 0.00 / 0.25] Pending - ... > [blade02:27130] [ 0.21 / 0.46] Running - ... > [blade02:27130] [221.25 / 221.71] Finished - > ompi_global_snapshot_27115.ckpt > Snapshot Ref.: 0 ompi_global_snapshot_27115.ckpt > > As you see, it takes 200+ secconds to checkpoint. btw, what the former and > latter number represent in [ , ]? > > Regards > > Whchen > Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://www.cs.indiana.edu/~jjhursey
[OMPI users] Checksuming in openmpi 1.4.1
Bonjour, I'm not sure I understand how to trigger CHECKSUM use inside of OpenMPI 1.4.1 (after digging in the FAQs, I got not explanations, sorry): - Is checksuming activated by default and embedded automatically within the Send/Recv pair mechanism, please ? - If not, which MCA param(S) should I set to activate it ? - Is there a time penalty for using it, please ? Thanks in advance for any help. -- Regards, Gilbert. -- *-* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 Facult??des Sciences, Bat. 200 Fax : +33 1 6446 8546 B.P. 34, F-91898 Orsay Cedex (FRANCE) -
[OMPI users] High Checkpoint Overhead Ratio
Dear OMPI Users, I’m now using BLCR-0.8.2 and OpenMPI-1.5rc5. The problem is that it takes a very long time to checkpoint. BLCR configuration: ./onfigure --prefix=/opt/blcr --enable-static OpenMPi configuration: ./configure --prefix=/opt/ompi --with-ft=cr --with-blcr=/opt/blcr --enable-static --enable-ft-thread --enable-mpi-threads Our blades use NFS. $HOME and /opt are shared. In $HOME/.opnempi/mca-params.conf: crs_base_snapshot_dir=/tmp/ snapc_base_global_snapshot_dir=/home/chenwh snapc_basee_store_in_place=0 Now I run CG NPB (NPROCS=16, CLASS=C) on two nodes (blade02, blade04). With no checkpoint, 'Time in seconds' is about 100s. It's normal. But when I take a single checkpoint, 'Time in seconds' is up to 300s. The overhead ratio is over 200%! WHY? How can I improve it? blade02:~> ompi-checkpoint --status 27115 [blade02:27130] [ 0.00 / 0.25] Requested - ... [blade02:27130] [ 0.00 / 0.25] Pending - ... [blade02:27130] [ 0.21 / 0.46] Running - ... [blade02:27130] [221.25 / 221.71] Finished - ompi_global_snapshot_27115.ckpt Snapshot Ref.: 0 ompi_global_snapshot_27115.ckpt As you see, it takes 200+ secconds to checkpoint. btw, what the former and latter number represent in [ , ]? Regards Whchen