On Thursday 10 December 2009 15:42:49 Mark Bolstad wrote: > Just a quick interjection, I also have a dual-quad Nehalem system, HT on, > 24GB ram, hand compiled 1.3.4 with options: --enable-mpi-threads > --enable-mpi-f77=no --with-openib=no > > With v1.3.4 I see roughly the same behavior, hello, ring work, connectivity > fails randomly with np >= 8. Turning on -v increased the success, but still > hangs. np = 16 fails more often, and the hang is random in which pair of > processes are communicating. > > However, it seems to be related to the shared memory layer problem. Running > with -mca btl ^sm works consistently through np = 128. >
I have the same problem, same machine (dual-quad Nehalem system, HT on) - for me the fix was the one from (https://svn.open-mpi.org/trac/ompi/ticket/2043) mpirun -np 8 -mca btl_sm_num_fifos 7 Mattijs > Hope this helps. > > Mark > > On Wed, Dec 9, 2009 at 8:03 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > > Hi Matthew > > > > Save any misinterpretation I may have made of the code: > > > > Hello_c has no real communication, except for a final Barrier > > synchronization. > > Each process prints "hello world" and that's it. > > > > Ring probes a little more, with processes Send(ing) and > > Recv(cieving) messages. > > Ring just passes a message sequentially along all process > > ranks, then back to rank 0, and repeat the game 10 times. > > Rank 0 is in charge of counting turns, decrementing the counter, > > and printing that (nobody else prints). > > With 4 processes: > > 0->1->2->3->0->1... 10 times > > > > In connectivity every pair of processes exchange a message. > > Therefore it probes all pairwise connections. > > In verbose mode you can see that. > > > > These programs shouldn't hang at all, if the system were sane. > > Actually, they should even run with a significant level of > > oversubscription, say, > > -np 128 should work easily for all three programs on a powerful > > machine like yours. > > > > > > ** > > > > Suggestions > > > > 1) Stick to the OpenMPI you compiled. > > > > ** > > > > 2) You can run connectivity_c in verbose mode: > > > > home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 connectivity_c -v > > > > (Note the trailing "-v".) > > > > It should tell more about who's talking to who. > > > > ** > > > > 3) I wonder if there are any BIOS settings that may be required > > (and perhaps not in place) to make the Nehalem hyperthreading to > > work properly in your computer. > > > > You reach the BIOS settings by typing <DEL> or <F2> > > when the computer boots up. > > The key varies by > > BIOS and computer vendor, but shows quickly on the bootup screen. > > > > You may ask the computer vendor about the recommended BIOS settings. > > If you haven't done this before, be careful to change and save only > > what really needs to change (if anything really needs to change), > > or the result may be worse. > > (Overclocking is for gamers, not for genome researchers ... :) ) > > > > ** > > > > 4) What I read about Nehalem DDR3 memory is that it is optimal > > on configurations that are multiples of 3GB per CPU. > > Common configs. in dual CPU machines like yours are > > 6, 12, 24 and 48GB. > > The sockets where you install the memory modules also matter. > > > > Your computer has 20GB. > > Did you build the computer or upgrade the memory yourself? > > Do you know how the memory is installed, in which memory sockets? > > What does the vendor have to say about it? > > > > See this: > > > > http://en.community.dell.com/blogs/dell_tech_center/archive/2009/04/08/ne > >halem-and-memory-configurations.aspx > > > > ** > > > > 5) As I said before, typing "f" then "j" on "top" will add > > a column (labeled "P") that shows in which core each process is running. > > This will let you observe how the Linux scheduler is distributing > > the MPI load across the cores. > > Hopefully it is load-balanced, and different processes go to different > > cores. > > > > *** > > > > It is very disconcerting when MPI processes hang. > > You are not alone. > > The reasons are not always obvious. > > At least in your case there is no network involved or to troubleshoot. > > > > > > ** > > > > I hope it helps, > > > > Gus Correa > > --------------------------------------------------------------------- > > Gustavo Correa > > Lamont-Doherty Earth Observatory - Columbia University > > Palisades, NY, 10964-8000 - USA > > --------------------------------------------------------------------- > > > > Matthew MacManes wrote: > >> Hi Gus and List, > >> > >> 1st of all Gus, I want to say thanks.. you have been a huge help, and > >> when I get this fixed, I owe you big time! > >> > >> However, the problems continue... > >> > >> I formatted the HD, reinstalled OS to make sure that I was working from > >> scratch. I did your step A, which seemed to go fine: > >> > >> macmanes@macmanes:~$ which mpicc > >> /home/macmanes/apps/openmpi1.4/bin/mpicc > >> macmanes@macmanes:~$ which mpirun > >> /home/macmanes/apps/openmpi1.4/bin/mpirun > >> > >> Good stuff there... > >> > >> I then compiled the example files: > >> > >> macmanes@macmanes:~/Downloads/openmpi-1.4/examples$ > >> /home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 ring_c > >> Process 0 sending 10 to 1, tag 201 (8 processes in ring) > >> Process 0 sent to 1 > >> Process 0 decremented value: 9 > >> Process 0 decremented value: 8 > >> Process 0 decremented value: 7 > >> Process 0 decremented value: 6 > >> Process 0 decremented value: 5 > >> Process 0 decremented value: 4 > >> Process 0 decremented value: 3 > >> Process 0 decremented value: 2 > >> Process 0 decremented value: 1 > >> Process 0 decremented value: 0 > >> Process 0 exiting > >> Process 1 exiting > >> Process 2 exiting > >> Process 3 exiting > >> Process 4 exiting > >> Process 5 exiting > >> Process 6 exiting > >> Process 7 exiting > >> macmanes@macmanes:~/Downloads/openmpi-1.4/examples$ > >> /home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 connectivity_c > >> Connectivity test on 8 processes PASSED. > >> macmanes@macmanes:~/Downloads/openmpi-1.4/examples$ > >> /home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 connectivity_c > >> ..HANGS..NO OUTPUT > >> > >> this is maddening because ring_c works.. and connectivity_c worked the > >> 1st time, but not the second... I did it 10 times, and it worked twice.. > >> here is the TOP screenshot: > >> > >> > >> http://picasaweb.google.com/macmanes/DropBox?authkey=Gv1sRgCLKokNOVqo7BY > >>w#5413382182027669394 > >> > >> What is the difference between connectivity_c and ring_c? Under what > >> circumstances should one fail and not the other... > >> > >> I'm off to the Linux forums to see about the Nehalem kernel issues.. > >> > >> Matt > >> > >> > >> > >> On Wed, Dec 9, 2009 at 13:25, Gus Correa <g...@ldeo.columbia.edu <mailto: > >> g...@ldeo.columbia.edu>> wrote: > >> > >> Hi Matthew > >> > >> There is no point in trying to troubleshoot MrBayes and ABySS > >> if not even the OpenMPI test programs run properly. > >> You must straighten them out first. > >> > >> ** > >> > >> Suggestions: > >> > >> ** > >> > >> A) While you are at OpenMPI, do yourself a favor, > >> and install it from source on a separate directory. > >> Who knows if the OpenMPI package distributed with Ubuntu > >> works right on Nehalem? > >> Better install OpenMPI yourself from source code. > >> It is not a big deal, and may save you further trouble. > >> > >> Recipe: > >> > >> 1) Install gfortran and g++ if you don't have them using apt-get. > >> 2) Put the OpenMPI tarball in, say /home/matt/downolads/openmpi > >> 3) Make another install directory *not in the system directory tree*. > >> Something like "mkdir /home/matt/apps/openmpi-X.Y.Z/" (X.Y.Z=version) > >> will work > >> 4) cd /home/matt/downolads/openmpi > >> 5) ./configure CC=gcc CXX=g++ F77=gfortran FC=gfortran \ > >> --prefix=/home/matt/apps/openmpi-X.Y.Z > >> (Use the prefix flag to install in the directory of item 3.) > >> 6) make > >> 7) make install > >> 8) At the bottom of your /home/matt/.bashrc or .profile file > >> put these lines: > >> > >> export PATH=/home/matt/apps/openmpi-X.Y.Z/bin:${PATH} > >> export MANPATH=/home/matt/apps/openmpi-X.Y.Z/share/man:`man -w` > >> export > >> LD_LIBRARY_PATH=home/matt/apps/openmpi-X.Y.Z/lib:${LD_LIBRARY_PATH} > >> > >> (If you use csh/tcsh use instead: > >> setenv PATH /home/matt/apps/openmpi-X.Y.Z/bin:${PATH} > >> etc) > >> > >> 9) Logout and login again to freshen um the environment variables. > >> 10) Do "which mpicc" to check that it is pointing to your newly > >> installed OpenMPI. > >> 11) Recompile and rerun the OpenMPI test programs > >> with 2, 4, 8, 16, .... processors. > >> Use full path names to mpicc and to mpirun, > >> if the change of PATH above doesn't work right. > >> > >> ******** > >> > >> B) Nehalem is quite new hardware. > >> I don't know if the Ubuntu kernel 2.6.31-16 fully supports all > >> of Nehalem features, particularly hyperthreading, and NUMA, > >> which are used by MPI programs. > >> I am not the right person to give you advice about this. > >> I googled out but couldn't find a clear information about > >> minimal kernel age/requirements to have Nehalem fully supported. > >> Some Nehalem owner in the list could come forward and tell. > >> > >> ** > >> > >> C) On the top screenshot you sent me, please try it again > >> (after you do item A) but type "f" and "j" to show the processors > >> that are running each process. > >> > >> ** > >> > >> D) Also, the screeshot shows 20GB of memory. > >> This sounds not as a optimal memory for Nehalem, > >> which tend to be 6GB, 12GB, 24GB, 48GB. > >> Did you put together the system, or upgraded the memory yourself, > >> of did you buy the computer as is? > >> However, this should not break MPI anyway. > >> > >> ** > >> > >> E) Answering your question: > >> It is true that different flavors of MPI > >> used to compile (mpicc) and run (mpiexec) a program would probably > >> break right away, regardless of the number of processes. > >> However, when it comes to different versions of the > >> same MPI flavor (say OpenMPI 1.3.4 and OpenMPI 1.3.3) > >> I am not sure it will break. > >> I would guess it may run but not in a reliable way. > >> Problems may appear as you stress the system with more cores, etc. > >> But this is just a guess. > >> > >> ** > >> > >> I hope this helps, > >> > >> Gus Correa > >> --------------------------------------------------------------------- > >> Gustavo Correa > >> Lamont-Doherty Earth Observatory - Columbia University > >> Palisades, NY, 10964-8000 - USA > >> --------------------------------------------------------------------- > >> > >> > >> Matthew MacManes wrote: > >> > >> Hi Gus, > >> > >> Interestingly the results for the connectivity_c test... works > >> fine with -np <8. For -np >8 it works some of the time, other > >> times it HANGS. I have got to believe that this is a big clue!! > >> Also, when it hangs, sometimes I get the message "mpirun was > >> unable to cleanly terminate the daemons on the nodes shown > >> below" Note that NO nodes are shown below. Once, I got -np 250 > >> to pass the connectivity test, but I was not able to replicate > >> this reliable, so I'm not sure if it was a fluke, or what. Here > >> is a like to a screenshop of TOP when connectivity_c is hung > >> with -np 14.. I see that 2 processes are only at 50% CPU usage.. > >> Hmmmm > >> http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1s > >>RgCLKokNOVqo7BYw&feat=directlink < > >> http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1s > >>RgCLKokNOVqo7BYw&feat=directlink > >> > >> < > >> http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1s > >>RgCLKokNOVqo7BYw&feat=directlink < > >> http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1s > >>RgCLKokNOVqo7BYw&feat=directlink > >> > >> > >> > >> The other tests, ring_c, hello_c, as well as the cxx versions of > >> these guys with with all values of -np. > >> > >> Using -mca mpi-paffinity_alone 1 I get the same behavior. > >> I agree that I am should worry about the mismatch between where > >> the libraries are installed versus where I am telling my > >> programs to look for them. Would this type of mismatch cause > >> behavior like what I am seeing, i.e. working with a small > >> number of processors, but failing with larger? It seems like a > >> mismatch would have the same effect regardless of the number of > >> processors used. Maybe I am mistaken. Anyway, to address this, > >> which mpirun gives me /usr/local/bin/mpirun.. so to configure > >> ./configure --with-mpi=/usr/local/bin/mpirun and to run > >> /usr/local/bin/mpirun -np X ... This should > >> uname -a gives me: Linux macmanes 2.6.31-16-generic #52-Ubuntu > >> SMP Thu Dec 3 22:07:16 UTC 2006 x86_64 GNU/Linux > >> > >> Matt > >> > >> On Dec 8, 2009, at 8:50 PM, Gus Correa wrote: > >> > >> Hi Matthew > >> > >> Please see comments/answers inline below. > >> > >> Matthew MacManes wrote: > >> > >> Hi Gus, Thanks for your ideas.. I have a few questions, > >> and will try to answer yours in hopes of solving this!! > >> > >> > >> A simple way to test OpenMPI on your system is to run the > >> test programs that come with the OpenMPI source code, > >> hello_c.c, connectivity_c.c, and ring_c.c: > >> http://www.open-mpi.org/ > >> > >> Get the tarball from the OpenMPI site, gzip and untar it, > >> and look for it in the "examples" directory. > >> Compile it with /your/path/to/openmpi/bin/mpicc hello_c.c > >> Run it with /your/path/to/openmpi/bin/mpiexec -np X a.out > >> using X = 2, 4, 8, 16, 32, 64, ... > >> > >> This will tell if your OpenMPI is functional, > >> and if you can run on many Nehalem cores, > >> even with oversubscription perhaps. > >> It will also set the stage for further investigation of your > >> actual programs. > >> > >> > >> Should I worry about setting things like --num-cores > >> --bind-to-cores? This, I think, gets at your questions > >> about processor affinity.. Am I right? I could not > >> exactly figure out the -mca mpi-paffinity_alone stuff... > >> > >> > >> I use the simple minded -mca mpi-paffinity_alone 1. > >> This is probably the easiest way to assign a process to a > >> core. There more complex ways in OpenMPI, but I haven't tried. Indeed, > >> -mca mpi-paffinity_alone 1 does improve performance of our programs > >> here. > >> There is a chance that without it the 16 virtual cores of > >> your Nehalem get confused with more than 3 processes > >> (you reported that -np > 3 breaks). > >> > >> Did you try adding just -mca mpi-paffinity_alone 1 to > >> your mpiexec command line? > >> > >> > >> 1. Additional load: nope. nothing else, most of the time > >> not even firefox. > >> > >> > >> Good. > >> Turn off firefox, etc, to make it even better. > >> Ideally, use runlevel 3, no X, like a computer cluster node, > >> but this may not be required. > >> > >> 2. RAM: no problems apparent when monitoring through > >> TOP. Interesting, I did wonder about oversubscription, > >> so I tried the option --nooversubscription, but this > >> gave me an error mssage. > >> > >> > >> Oversubscription from your program would only happen if > >> you asked for more processes than available cores, i.e., > >> -np > 8 (or "virtual" cores, in case of Nehalem > >> hyperthreading, -np > 16). > >> Since you have -np=4 there is no oversubscription, > >> unless you have other external load (e.g. Matlab, etc), > >> but you said you don't. > >> > >> Yet another possibility would be if your program is threaded > >> (e.g. using OpenMP along with MPI), but considering what you > >> said about OpenMP I would guess the programs don't use it. > >> For instance, you launch the program with 4 MPI processes, > >> and each process decides to start, say, 8 OpenMP threads. > >> You end up with 32 threads and 8 (real) cores (or 16 > >> hyperthreaded > >> ones on Nehalem). > >> > >> > >> What else does top say? > >> Any hog processes (memory- or CPU-wise) > >> besides your program processes? > >> > >> 3. I have not tried other MPI flavors.. Ive been > >> speaking to the authors of the programs, and they are > >> both using openMPI. > >> > >> I was not trying to convince you to use another MPI. > >> I use MPICH2 also, but OpenMPI reigns here. > >> The idea or trying it with MPICH2 was just to check whether > >> OpenMPI > >> is causing the problem, but I don't think it is. > >> > >> 4. I don't think that this is a problem, as I'm > >> specifying --with-mpi=/usr/bin/... when I compile the > >> programs. Is there any other way to be sure that this is > >> not a problem? > >> > >> > >> Hmmm .... > >> I don't know about your Ubuntu (we have CentOS and Fedora on > >> various > >> machines). > >> However, most Linux distributions come with their MPI > >> flavors, and so do compilers, etc. > >> Often times they install these goodies in unexpected places, > >> and this has caused a lot of frustration. > >> There are tons of postings on this list that eventually > >> boiled down to mismatched versions of MPI in unexpected > >> places. > >> > >> > >> The easy way is to use full path names to compile and to run. > >> Something like this: > >> /my/openmpi/bin/mpicc on your program configuration script), > >> > >> and something like this > >> /my/openmpi/bin/mpiexec -np ... bla, bla ... > >> when you submit the job. > >> > >> You can check your version with "which mpicc", "which > >> mpiexec", and (perhaps using full path names) with > >> "ompi_info", "mpicc --showme", "mpiexec --help". > >> > >> > >> 5. I had not been, and you could see some shuffling when > >> monitoring the load on specific processors. I have tried > >> to use --bind-to-cores to deal with this. I don't > >> understand how to use the -mca options you asked about. > >> 6. I am using Ubuntu 9.10. gcc 4.4.1 and g++ 4.4.1 > >> > >> > >> I am afraid I won't be of help, because I don't have Nehalem. > >> However, I read about Nehalem requiring quite recent kernels > >> to get all of its features working right. > >> > >> What is the output of "uname -a"? > >> This will tell the kernel version, etc. > >> Other list subscribers may give you a suggestion if you post > >> the > >> information. > >> > >> MyBayes is a for bayesian phylogenetics: > >> http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page > >> ABySS: is a program for assembly of DNA sequence data: > >> http://www.bcgsc.ca/platform/bioinfo/software/abyss > >> > >> > >> Thanks for the links! > >> I had found the MrBayes link. > >> I eventually found what your ABySS was about, but no links. > >> Amazing that it is about DNA/gene sequencing. > >> Our abyss here is the deep ocean ... :) > >> Abysmal difference! > >> > >> Do the programs mix MPI (message passing) with > >> OpenMP (threads)? > >> > >> Im honestly not sure what this means.. > >> > >> > >> Some programs mix the two. > >> OpenMP only works in a shared memory environment (e.g. a > >> single computer like yours), whereas MPI can use both shared memory and > >> work across a network (e.g. in a cluster). > >> There are other differences too. > >> > >> Unlikely that you have this hybrid type of parallel program, > >> otherwise there would be some reference to OpenMP > >> on the very program configuration files, program > >> documentation, etc. > >> Also, in general the configuration scripts of these hybrid > >> programs can turn on MPI only, or OpenMP only, or both, > >> depending on how you configure. > >> > >> Even to compile with OpenMP you would need a proper compiler > >> flag, but that one might be hidden in a Makefile too, making > >> a bit hard to find. "grep -n mp Makefile" may give a clue. > >> Anything on the documentation that mentions threads or > >> OpenMP? > >> > >> FYI, here is OpenMP: > >> http://openmp.org/wp/ > >> > >> Thanks for all your help! > >> > >> > Matt > >> > >> Well, so far it didn't really help. :( > >> > >> But let's hope to find a clue, > >> maybe with a little help of > >> our list subscriber friends. > >> > >> Gus Correa > >> > >> --------------------------------------------------------------------- > >> Gustavo Correa > >> Lamont-Doherty Earth Observatory - Columbia University > >> Palisades, NY, 10964-8000 - USA > >> > >> --------------------------------------------------------------------- > >> > >> Hi Matthew > >> > >> More guesses/questions than anything else: > >> > >> 1) Is there any additional load on this machine? > >> We had problems like that (on different machines) > >> when users start listening to streaming video, doing Matlab > >> calculations, > >> etc, while the MPI programs are running. > >> This tends to oversubscribe the cores, and may lead > >> to crashes. > >> > >> 2) RAM: > >> Can you monitor the RAM usage through "top"? > >> (I presume you are on Linux.) > >> It may show unexpected memory leaks, if they exist. > >> > >> On "top", type "1" (one) see all cores, type "f" > >> then "j" > >> to see the core number associated to each process. > >> > >> 3) Do the programs work right with other MPI flavors > >> (e.g. MPICH2)? > >> If not, then it is not OpenMPI's fault. > >> > >> 4) Any possibility that the MPI versions/flavors of > >> mpicc and > >> mpirun that you are using to compile and launch the > >> program are not the > >> same? > >> > >> 5) Are you setting processor affinity on mpiexec? > >> > >> mpiexec -mca mpi_paffinity_alone 1 -np ... bla, bla > >> ... > >> > >> Context switching across the cores may also cause > >> trouble, I suppose. > >> > >> 6) Which Linux are you using (uname -a)? > >> > >> On other mailing lists I read reports that only > >> quite recent kernels > >> support all the Intel Nehalem processor features > >> well. I don't have Nehalem, I can't help here, > >> but the information may be useful > >> for other list subscribers to help you. > >> > >> *** > >> > >> As for the programs, some programs require specific > >> setup, > >> (and even specific compilation) when the number of > >> MPI processes > >> vary. > >> It may help if you tell us a link to the program > >> sites. > >> > >> Baysian statistics is not totally out of our > >> business, but phylogenetic genetic trees is not really my league, hence > >> forgive me any bad guesses, please, > >> but would it need specific compilation or a different > >> set of input parameters to run correctly on a > >> different number of processors? > >> Do the programs mix MPI (message passing) with > >> OpenMP (threads)? > >> > >> I found this MrBayes, which seems to do the above: > >> > >> http://mrbayes.csit.fsu.edu/ > >> http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page > >> > >> As for the ABySS, what is it, where can it be found? > >> Doesn't look like a deep ocean circulation model, as > >> the name suggest. > >> > >> My $0.02 > >> Gus Correa > >> > >> > >> > >> ------------------------------------------------------------------------ > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org <mailto:us...@open-mpi.org> > >> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org <mailto:us...@open-mpi.org> > >> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > >> _________________________________ > >> Matthew MacManes > >> PhD Candidate > >> University of California- Berkeley > >> Museum of Vertebrate Zoology > >> Phone: 510-495-5833 > >> Lab Website: http://ib.berkeley.edu/labs/lacey > >> Personal Website: http://macmanes.com/ > >> > >> > >> > >> > >> > >> > >> > >> > >> ------------------------------------------------------------------------ > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org <mailto:us...@open-mpi.org> > >> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org <mailto:us...@open-mpi.org> > >> > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > >> > >> ------------------------------------------------------------------------ > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Mattijs Janssens OpenCFD Ltd. 9 Albert Road, Caversham, Reading RG4 7AN. Tel: +44 (0)118 9471030 Email: m.janss...@opencfd.co.uk URL: http://www.OpenCFD.co.uk