Re: [OMPI users] How can I tell (open-)mpi about the HW topology of mysystem?
The short answer is that OMPI currently does not remap ranks during MPI_CART_CREATE, even if you pass reorder==1. :-\ The reason is because we've had very little requests to do so. However, we do have the good foresight (if I do say so myself ;-) ) to make the MPI topology system be a plugin in Open MPI. The only plugin for this system is currently the "do nothing" plugin, but it would *not* be difficult to write one that actually did something meaningful in your torus. If you're interested, I'd be happy to explain how to do it (and we should probably move to the devel list). OMPI doesn't require too much framework code; I would guess that the majority of the code would actually be implementing whatever algorithms you wanted for your torus. Heck, you could even write a blind-and-dumb algorithm that just looks up tables in files based on hostnames in your torus. On Oct 23, 2009, at 7:54 AM, Luigi Scorzato wrote: Hi everybody, The short question is: How can I tell (open-)mpi about the HW topology of my system? The longer form is the following, I have a cluster which is physically connected in a 3D torus topology (say 5x3x2). The nodes have names: node_000, node_001, ... node_421. I can use a rankfile to assign a fix MPI rank to each node. E.g: rank 0 = node_000 rank 1 = node_001 rank 2 = node_010 ... However, in general, nothing forces e.g. MPI_Cart_create() to build the 3D grid I want i.e. coord[node_ijk] =(i,j,k) rather than, say coord[node_000] =(0,0,0), coord[node_001] =(1,0,0), coord[node_010] = (2,0,0) ..., which would be wrongly mapped to the physical topology. How can I bind at least MPI_Cart_create() to the topology I want? Of course it would be nice to use an MPI-compliant procedure, if it exists. If not, I am also happy with something that works at least with some version of open-mpi. Note: For some reason too long to explain I cannot rely on a system that tests the the connections at the beginning. But the is no reason to do these tests, since I know my topology exactly. Thanks in advance for any help! Luigi ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] memchecker overhead?
On Mon, 2009-10-26 at 16:21 -0400, Jeff Squyres wrote: > there's a tiny/ > small amount of overhead inserted by OMPI telling Valgrind "this > memory region is ok", but we live in an intensely competitive HPC > environment. I may be wrong but I seem to remember Julian saying the overhead is twelve cycles for the valgrind calls. Of course calculating what to pass to valgrind may add to this. > The option to enable this Valgrind Goodness in OMPI is --with- > valgrind. I *think* the option may be the same for libibverbs, but I > don't remember offhand. > > That being said, I'm guessing that we still have bunches of other > valgrind warnings that may be legitimate. We can always use some help > to stamp out these warnings... :-) I note there is a bug for this, being "Valgrind clean" is a very desirable feature for any software and particularly a library IMHO. https://svn.open-mpi.org/trac/ompi/ticket/1720 Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk
Re: [OMPI users] memchecker overhead?
Jeff Squyres wrote: > Verbs and Open MPI don't have these options on by default because a) > you need to compile against Valgrind's header files to get them to > work, and b) there's a tiny/small amount of overhead inserted by OMPI > telling Valgrind "this memory region is ok", but we live in an > intensely competitive HPC environment. It's certainly competitive, but we spend most of our implementation time getting things correct rather than tuning. The huge speed benefits come from algorithmic advances, and finding bugs quickly makes the implementation of new algorithms easier. I'm not arguing that it should be on by default, but it's helpful to have an environment where the lower-level libs are valgrind-clean. These days, I usually revert to MPICH when hunting something with valgrind, but use OMPI most other times. > The option to enable this Valgrind Goodness in OMPI is --with-valgrind. > I *think* the option may be the same for libibverbs, but I don't > remember offhand. I see plenty of warning over btl sm. Several variations, including the excessive --enable-debug --enable-mem-debug --enable-mem-profile \ --enable-memchecker --with-valgrind=/usr were not sufficient. (I think everything in this line except --with-valgrind increases the number of warnings, but it's nontrivial with plain --with-valgrind.) Thanks, Jed signature.asc Description: OpenPGP digital signature
Re: [OMPI users] memchecker overhead?
There's a whole class of valgrind warnings that are generated when you use OS-bypass networks like OpenFabrics. The verbs library and Open MPI can be configured and compiled with additional instructions that tell Valgrind where the "problematic" spots are, and that the memory is actually ok (because it's memory that came from outside of Valgrind's scope of influence). Verbs and Open MPI don't have these options on by default because a) you need to compile against Valgrind's header files to get them to work, and b) there's a tiny/ small amount of overhead inserted by OMPI telling Valgrind "this memory region is ok", but we live in an intensely competitive HPC environment. The option to enable this Valgrind Goodness in OMPI is --with- valgrind. I *think* the option may be the same for libibverbs, but I don't remember offhand. That being said, I'm guessing that we still have bunches of other valgrind warnings that may be legitimate. We can always use some help to stamp out these warnings... :-) On Oct 26, 2009, at 4:09 PM, Jed Brown wrote: Samuel K. Gutierrez wrote: > Hi Jed, > > I'm not sure if this will help, but it's worth a try. Turn off OMPI's > memory wrapper and see what happens. > > c-like shell > setenv OMPI_MCA_memory_ptmalloc2_disable 1 > > bash-like shell > export OMPI_MCA_memory_ptmalloc2_disable=1 > > Also add the following MCA parameter to you run command. > > --mca mpi_leave_pinned 0 Thanks for the tip, but these make very little difference. Jed ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] memchecker overhead?
Samuel K. Gutierrez wrote: > Hi Jed, > > I'm not sure if this will help, but it's worth a try. Turn off OMPI's > memory wrapper and see what happens. > > c-like shell > setenv OMPI_MCA_memory_ptmalloc2_disable 1 > > bash-like shell > export OMPI_MCA_memory_ptmalloc2_disable=1 > > Also add the following MCA parameter to you run command. > > --mca mpi_leave_pinned 0 Thanks for the tip, but these make very little difference. Jed signature.asc Description: OpenPGP digital signature
Re: [OMPI users] memchecker overhead?
Hi Jed, I'm not sure if this will help, but it's worth a try. Turn off OMPI's memory wrapper and see what happens. c-like shell setenv OMPI_MCA_memory_ptmalloc2_disable 1 bash-like shell export OMPI_MCA_memory_ptmalloc2_disable=1 Also add the following MCA parameter to you run command. --mca mpi_leave_pinned 0 -- Samuel K. Gutierrez Los Alamos National Laboratory On Oct 26, 2009, at 1:41 PM, Jed Brown wrote: Jeff Squyres wrote: Using --enable-debug adds in a whole pile of developer-level run-time checking and whatnot. You probably don't want that on production runs. I have found that --enable-debug --enable-memchecker actually produces more valgrind noise than leaving them off. Are there options to make Open MPI strict about initializing and freeing memory? At one point I tried to write policy files, but even with judicious globbing, I kept getting different warnings when run on a different program. (All these codes were squeaky-clean under MPICH2.) Jed ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] memchecker overhead?
Jeff Squyres wrote: > Using --enable-debug adds in a whole pile of developer-level run-time > checking and whatnot. You probably don't want that on production runs. I have found that --enable-debug --enable-memchecker actually produces more valgrind noise than leaving them off. Are there options to make Open MPI strict about initializing and freeing memory? At one point I tried to write policy files, but even with judicious globbing, I kept getting different warnings when run on a different program. (All these codes were squeaky-clean under MPICH2.) Jed signature.asc Description: OpenPGP digital signature
Re: [OMPI users] memchecker overhead?
On Oct 26, 2009, at 3:29 PM, Jeff Squyres wrote: On Oct 26, 2009, at 3:23 PM, Brock Palen wrote: Is there a large overhead for --enable-debug --enable-memchecker? --enable-debug, yes, there is a pretty large penalty. --enable- debug is really only intended for Open MPI developers. If you just want an OMPI that was compiled with debugging symbols, then just add -g to the CFLAGS/CXXFLAGS in OMPI's configure, perhaps like this: Interesting, we were just looking at the memchecker functionality and don't want to double the number of MPI builds we offer. In the Debugg FAQ section 10 http://www.open-mpi.org/faq/?category=debugging#memchecker_how It says you need --enable-debug to use --enable-memchecker, is this really the case then? shell$ ./configure CFLAGS=-g CXXFLAGS=-g ... Using --enable-debug adds in a whole pile of developer-level run- time checking and whatnot. You probably don't want that on production runs. I'll let the HLRS guys comment on the cost of --enable-memchecker; I suspect the answer will be "it depends". -- Jeff Squyres jsquy...@cisco.com ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] compiling openmpi with mixed CISCO infiniband. cardand Mellanox infiniband cards.
On Oct 16, 2009, at 1:55 PM, nam kim wrote: Our school has a cluster running over CISCO based Infiniband cards and switch. Recently, we purchased more computing nods with Mellanox card since CISCO stops making IB card anymore. Sorry for the delay in replying; my INBOX has grown totally out of hand recently. :-( FWIW, Cisco never made IB HCAs; we simply resold Mellanox HCAs. Currently, I use openmpi 1.2.8 compiled with CISCO IB card (SFS- HCA-320-A1) with topspin driver. My questions are: 1. Is it possible to compile 1.3 version with mixed cisco IB and mellanox IB (MHRH19-XTC) with open infiniband libries? Do you mean: is it possible to use Open MPI 1.3.x with a recent OFED distribution across multiple nodes, some of which include Cisco- branded HCAs and some of which include Mellanox HCAs? The answer is: most likely, yes. Open MPI doesn't fully support "heterogeneous" HCAs (e.g., HCAs that would require different MTUs). But I suspect that your HCAs are all "close enough" that it won't matter. FWIW, on my 64-node MPI testing cluster at Cisco, I do similar things -- I have various Cisco and Mellanox HCAs of different generations and specific capabilities, and Open MPI runs fine. 2. Is is possible to compile 1.2.8 with mixed cisco IB and mellanox IB, then how? If you can, I'd highly suggest upgrading to the Open MPI v1.3 series. -- Jeff Squyres jsquy...@cisco.com
[OMPI users] memchecker overhead?
Is there a large overhead for --enable-debug --enable-memchecker? reading: http://www.open-mpi.org/faq/?category=debugging It sounds like there is and there isn't, what should I expect if we build all of our mpi libraries with those options, when we run normally: mpirun ./myexe vs using a library that was not built with those options? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985
Re: [OMPI users] Openmpi not using IB and no warning message
On Oct 15, 2009, at 2:14 AM, Sangamesh B wrote: I've run ibpingpong tests. They are working fine. Sorry for the delay in replying. Good. Are there any additional tests available which will make sure that "there is no problem with IB software and Open MPI. The problem is with Application or IB hardware"? George mentioned the point that using "--mca btl openib,self" will only allow OMPI to use those two networks. So you should be good there -- with those command line options, it'll either run on IB or it will fail to run if the IB is not working. Unfortunately, OMPI currently only has a negative acknowledgement when you're *not* using high-performance networks -- it doesn't give you a positive acknowledgement when it *is* using a high-performance network (because this is the much more common case). Because we've faced some critical problems: http://www.open-mpi.org/community/lists/users/2009/10/10843.php This one *appears* to be an application issue. But there was no information provided beyond the initial posting, so it's impossible to say. http://www.open-mpi.org/community/lists/users/2009/09/10700.php Pasha had a good reply to this post: http://www.open-mpi.org/community/lists/users/2009/09/10705.php If he's right (and he usually is :-) ), then one of your IB ports when from ACTIVE to DOWN during the run, potentially indicating bad hardware (i.e., Open MPI simply reported the error -- it's possible/ likely that Open MPI didn't *cause* the error). Pasha suggested using ibdiagnet to verify your fabric. Failing that, you might want to contact your IB/cluster vendor for assistance with a layer-0 diagnostic of your IB fabric. Hope that helps! -- Jeff Squyres jsquy...@cisco.com
[OMPI users] segmentation fault: Address not mapped
Dear list members I am using openmpi 1.3.3 with OFED on a HP cluster with redhatLinux. Occasionally (not always) I get a crash with the following message: [hydra11:09312] *** Process received signal *** [hydra11:09312] Signal: Segmentation fault (11) [hydra11:09312] Signal code: Address not mapped (1) [hydra11:09312] Failing at address: 0xab5f30a8 [hydra11:09312] [ 0] /lib64/libpthread.so.0 [0x3c1400e4c0] [hydra11:09312] [ 1] /home/ipl/openmpi-1.3.3/platforms/hp/lib/libmpi.so.0(MPI_Isend+0x93) [0x2af1be45a3e3] [hydra11:09312] [ 2] ./flow(MP_SendReal+0x60) [0x6bc993] [hydra11:09312] [ 3] ./flow(SendRealsAlongFaceWithOffset_3D+0x4ab) [0x68ba19] [hydra11:09312] [ 4] ./flow(MP_SendVertexArrayBlock+0x23d) [0x6891e1] [hydra11:09312] [ 5] ./flow(MB_CommAllVertex+0x65) [0x6848ba] [hydra11:09312] [ 6] ./flow(MB_SetupVertexArray+0xd5) [0x68c837] [hydra11:09312] [ 7] ./flow(MB_SetupGrid+0xa8) [0x68be51] [hydra11:09312] [ 8] ./flow(SetGrid+0x58) [0x446224] [hydra11:09312] [ 9] ./flow(main+0x148) [0x43b728] [hydra11:09312] [10] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c1341d974] [hydra11:09312] [11] ./flow(__gxx_personality_v0+0xd9) [0x429b19] [hydra11:09312] *** End of error message *** -- mpirun noticed that process rank 6 with PID 9312 on node hydra11 exited on signal 11 (Segmentation fault). -- The crash does not appear always - sometimes the application runs fine. However, it seems that the crash especially occurs when I run on more than 1 node. I have consulted the archive of open-mpi and have found many error messages of the same kind, but none from the 1.3.3 version, and none of direct relevance. I would really appreciate comments on this. Below is the information required according to the openmpi web, Config.log: attached (config.zip) Open mpi was configured with prefix and with the path to openib, and with the following compiler flags setenv CC gcc setenv CFLAGS '-O' setenv CXX g++ setenv CXXFLAGS '-O' setenv F77 'gfortran' setenv FFLAGS '-O' ompi_info -all: attached The application (named flow) was launched on hydra11 by nohup mpirun -H hydra11,hydra12 -np 8 ./flow caseC.in & the PATH and LD_LIBRARY_PATH, hydra11 and hydra12: PATH=/home/ipl/openmpi-1.3.3/platforms/hp/bin LD_LIBRARY_PATH= /home/ipl/openmpi-1.3.3/platforms/hp/lib OpenFabrics version: 1.4 Linux: X86_64-redhat-linux/3.4.6 ibv_devinfo, hydra11: attached ibv_devinfo, hydra12: attached ifconfig, hydra11: attached ifconfig, hydra12: attached ulimit -l (hydra11): 600 ulimit -l (hydra12): unlimited Furthermore, I can say that I have not specified any MCA parameters. The application which I am running (named flow) is linked from fortran, c and c++ libraries with the following: /home/ipl/openmpi-1.3.3/platforms/hp/bin/mpicc-DMP -DNS3_ARCH_LINUX -DLAPACK -I/home/ipl/ns3/engine/include_forLinux -I/home/ipl/openmpi-1.3.3/platforms/hp/include-c -o user_small_3D.o user_small_3D.c rm -f flow /home/ipl/openmpi-1.3.3/platforms/hp/bin/mpicxx -o flow user_small_3D.o -L/home/ipl/ns3/engine/lib_forLinux -lns3main -lns3pars -lns3util -lns3vofl -lns3turb -lns3solv -lns3mesh -lns3diff -lns3grid -lns3line -lns3data -lns3base -lfitpack -lillusolve -lfftpack_small -lfenton -lns3air -lns3dens -lns3poro -lns3sedi -llapack_small -lblas_small -lm -lgfortran /home/ipl/ns3/engine/lib_Tecplot_forLinux/tecio64.a Please let me know if you need more info! Thanks in advance, Iris Lohmann Iris Pernille Lohmann MSc, PhD Ports & Offshore Technology (POT) [cid:image001.gif@01CA564A.A05EDAA0] DHI Agern Allé 5 DK-2970 Hørsholm Denmark Tel: +45 4516 9200 Direct: 45169427 i...@dhigroup.com www.dhigroup.com WATER * ENVIRONMENT * HEALTH * ** ** ** WARNING: This email contains an attachment of a very suspicious type. ** ** You are urged NOT to open this attachment unless you are absolutely ** ** sure it is legitimate. Opening this attachment may cause irreparable ** ** damage to your computer and your files. If you have any questions ** ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. ** ** ** ** This warning was added by the IU Computer Science Dept. mail scanner. ** * <> <> ibv_devinfo_hydra11.out Description: ibv_devinfo_hydra11.out ibv_devinfo_hydra12.out Description: ibv_devinfo_hydra12.out ifconfig_hydra11.out Description: ifconfig_hydra11.out ifconfig_hydra12.out Description: ifconfig_hydra12.out
Re: [OMPI users] MPI-3 Fortran feedback
On Oct 25, 2009, at 11:38 PM, Steve Kargl wrote: There is currently a semi-heated debate in comp.lang.fortran concerning co-arrays and the upcoming Fortran 2008. Don't waste your time trying to decipher the thread; however, there appear to be a few knowledgable MPI Fortaners hang-out, lately. Would Craig mind if I relay the above to c.l.f.? Of course, if Craig prefers not to veer into USENET, I can understand his decision. The more feedback that we get, the better -- I don't have the cycles to read usenet, unfortunately. I don't know if Craig does (but I suspect that he does not). If they can reply here, on the blog post, or directly on the MPI-3 Fortran working group mailing list (linked to on the blog), that would be awesome. Thanks! -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] bug in MPI_Cart_create?
I can confirm that it is fixed on both the trunk and will be included in the upcoming 1.3.4 release. The code now reads: re_order = (0 == reorder)? false : true; Thanks for the heads-up! On Oct 26, 2009, at 6:40 AM, Kiril Dichev wrote: Hi David, I believe this particular bug was fixed in the trunk some weeks ago shortly before your post. Regards, Kiril On Tue, 2009-10-13 at 17:54 +1100, David Singleton wrote: > Looking back through the archives, a lot of people have hit error > messages like > > > [bl302:26556] *** An error occurred in MPI_Cart_create > > [bl302:26556] *** on communicator MPI_COMM_WORLD > > [bl302:26556] *** MPI_ERR_ARG: invalid argument of some other kind > > [bl302:26556] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > > One of the reasons people *may* be hitting this is what I believe to > be an incorrect test in MPI_Cart_create(): > > if (0 > reorder || 1 < reorder) { > return OMPI_ERRHANDLER_INVOKE (old_comm, MPI_ERR_ARG, >FUNC_NAME); > } > > reorder is a "logical" argument and "2.5.2 C bindings" in the MPI 1.3 > standard says: > > Logical flags are integers with value 0 meaning “false” and a > non-zero value meaning “true.” > > So I'm not sure there should be any argument test. > > > We hit this because we (sorta erroneously) were trying to use a GNU build > of Open MPI with Intel compilers. gfortran has true=1 while ifort has > true=-1. It seems to all work (by luck, I know) except this test. Are > there any other tests like this in Open MPI? > > David > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Dipl.-Inf. Kiril Dichev Tel.: +49 711 685 60492 E-mail: dic...@hlrs.de High Performance Computing Center Stuttgart (HLRS) Universität Stuttgart 70550 Stuttgart Germany ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI users] MPI-3 Fortran feedback
On Fri, Oct 23, 2009 at 08:53:01AM -0400, Jeff Squyres wrote: > If you're a Fortran MPI developer, I have a question for you. > > In the MPI-3 Forum, we're working on revamping the Fortran bindings to > be "better" (for a variety of definitions of "better"). There's at > least one question that we really need some feedback from the MPI > Fortran developer community before proceeding. Craig Rasmussen from > Los Alamos National Laboratory, chair of the MPI-3 Fortran Working > Group, asked me to post a "request for information" to my blog and > pass on the URL to every Fortran MPI programmer that I know: > > > http://blogs.cisco.com/ciscotalk/performance/comments/mpi-3_fortran_community_feedback_needed/ > > Please go read that entry and let us know what you think. > > Many thanks! > Jeff, There is currently a semi-heated debate in comp.lang.fortran concerning co-arrays and the upcoming Fortran 2008. Don't waste your time trying to decipher the thread; however, there appear to be a few knowledgable MPI Fortaners hang-out, lately. Would Craig mind if I relay the above to c.l.f.? Of course, if Craig prefers not to veer into USENET, I can understand his decision. -- Steve