Hi All, 

I agree that the issue is troublesome.  It apparently has been reported, and 
there is an active bug report, with some technical discussion of the underlying 
problems, found here: https://svn.open-mpi.org/trac/ompi/ticket/2043

For now, it is OK, but it is an issue that hopefully will be resolved sooner 
rather then later. 

Thanks again for everybody's help!
Matt


On Dec 10, 2009, at 2:01 PM, Gus Correa wrote:

> HI Mark, Matthew, list
> 
> Oh well, Mark's direct experience on a Nehalem
> is a game changer, and his recommendation to turn off the shared
> memory feature may be the way to go for Matthew, at least to have
> things working.
> Thank you Mark, your interjection sheds new light on the awkward
> situation reported by Matthew.
> I don't have a Nehalem platform, hence I cannot do any testing.
> 
> A couple of questions to the OpenMPI pros:
> If shared memory ("sm") is turned off on a standalone computer,
> which mechanism is used for MPI communication?
> TCP via loopback port?  Other?
> Why wouldn't shared memory work right on Nehalem?
> (That is probably distressing for Mark, Matthew, and other
> Nehalem owners.)
> 
> So, judging from Mark's experiments,
> it looks like Nehalem, or perhaps its interaction with
> the current Linux kernels, still hasn't solved problems regarding
> efficent memory access.
> Or is this a rough misinterpretation of Mark's experiences?
> 
> It is amazing to me that this issue hasn't surfaced on this list
> before.
> Or maybe it did but I wasn't paying attention, after all,
> I don't have Nehalem.
> After all this is about the very basic functionality of MPI
> in the latest hardware, which has been in the market for several
> months now.
> 
> Anybody running MPI production code on Nehalem,
> that can report scaling experiments, perhaps compare with other
> hardware platforms?
> 
> Any possibility that tweaking with BIOS settings or
> special kernel parameters can solve this problem?
> 
> Any word from the pros on the list that have direct experience
> with Nehalem and OpenMPI?
> 
> Anybody has experimented with MPICH2 on a single node dual
> socket Nehalem, for a comparison?
> 
> Thanks,
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
> 
> 
> Mark Bolstad wrote:
>> Just a quick interjection, I also have a dual-quad Nehalem system, HT on, 
>> 24GB ram, hand compiled 1.3.4 with options: --enable-mpi-threads 
>> --enable-mpi-f77=no --with-openib=no
>> With v1.3.4 I see roughly the same behavior, hello, ring work, connectivity 
>> fails randomly with np >= 8. Turning on -v increased the success, but still 
>> hangs. np = 16 fails more often, and the hang is random in which pair of 
>> processes are communicating.
>> However, it seems to be related to the shared memory layer problem. Running 
>> with -mca btl ^sm works consistently through np = 128.
>> Hope this helps.
>> Mark
>> On Wed, Dec 9, 2009 at 8:03 PM, Gus Correa <g...@ldeo.columbia.edu 
>> <mailto:g...@ldeo.columbia.edu>> wrote:
>>    Hi Matthew
>>    Save any misinterpretation I may have made of the code:
>>    Hello_c has no real communication, except for a final Barrier
>>    synchronization.
>>    Each process prints "hello world" and that's it.
>>    Ring probes a little more, with processes Send(ing) and
>>    Recv(cieving) messages.
>>    Ring just passes a message sequentially along all process
>>    ranks, then back to rank 0, and repeat the game 10 times.
>>    Rank 0 is in charge of counting turns, decrementing the counter,
>>    and printing that (nobody else prints).
>>    With 4 processes:
>>    0->1->2->3->0->1... 10 times
>>    In connectivity every pair of processes exchange a message.
>>    Therefore it probes all pairwise connections.
>>    In verbose mode you can see that.
>>    These programs shouldn't hang at all, if the system were sane.
>>    Actually, they should even run with a significant level of
>>    oversubscription, say,
>>    -np 128  should work easily for all three programs on a powerful
>>    machine like yours.
>>    **
>>    Suggestions
>>    1) Stick to the OpenMPI you compiled.
>>    **
>>    2) You can run connectivity_c in verbose mode:
>>    home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 connectivity_c -v
>>    (Note the trailing "-v".)
>>    It should tell more about who's talking to who.
>>    **
>>    3) I wonder if there are any BIOS settings that may be required
>>    (and perhaps not in place) to make the Nehalem hyperthreading to
>>    work properly in your computer.
>>    You reach the BIOS settings by typing <DEL> or <F2>
>>    when the computer boots up.
>>    The key varies by
>>    BIOS and computer vendor, but shows quickly on the bootup screen.
>>    You may ask the computer vendor about the recommended BIOS settings.
>>    If you haven't done this before, be careful to change and save only
>>    what really needs to change (if anything really needs to change),
>>    or the result may be worse.
>>    (Overclocking is for gamers, not for genome researchers ... :) )
>>    **
>>    4) What I read about Nehalem DDR3 memory is that it is optimal
>>    on configurations that are multiples of 3GB per CPU.
>>    Common configs. in dual CPU machines like yours are
>>    6, 12, 24 and 48GB.
>>    The sockets where you install the memory modules also matter.
>>    Your computer has 20GB.
>>    Did you build the computer or upgrade the memory yourself?
>>    Do you know how the memory is installed, in which memory sockets?
>>    What does the vendor have to say about it?
>>    See this:
>>    
>> http://en.community.dell.com/blogs/dell_tech_center/archive/2009/04/08/nehalem-and-memory-configurations.aspx
>>    **
>>    5) As I said before, typing "f" then "j" on "top" will add
>>    a column (labeled "P") that shows in which core each process is running.
>>    This will let you observe how the Linux scheduler is distributing
>>    the MPI load across the cores.
>>    Hopefully it is load-balanced, and different processes go to different
>>    cores.
>>    ***
>>    It is very disconcerting when MPI processes hang.
>>    You are not alone.
>>    The reasons are not always obvious.
>>    At least in your case there is no network involved or to troubleshoot.
>>    **
>>    I hope it helps,
>>    Gus Correa
>>    ---------------------------------------------------------------------
>>    Gustavo Correa
>>    Lamont-Doherty Earth Observatory - Columbia University
>>    Palisades, NY, 10964-8000 - USA
>>    ---------------------------------------------------------------------
>>    Matthew MacManes wrote:
>>        Hi Gus and List,
>>        1st of all Gus, I want to say thanks.. you have been a huge
>>        help, and when I get this fixed, I owe you big time!
>>        However, the problems continue...
>>        I formatted the HD, reinstalled OS to make sure that I was
>>        working from scratch.  I did your step A, which seemed to go fine:
>>        macmanes@macmanes:~$ which mpicc
>>        /home/macmanes/apps/openmpi1.4/bin/mpicc
>>        macmanes@macmanes:~$ which mpirun
>>        /home/macmanes/apps/openmpi1.4/bin/mpirun
>>        Good stuff there...
>>        I then compiled the example files:
>>        macmanes@macmanes:~/Downloads/openmpi-1.4/examples$
>>        /home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 ring_c
>>        Process 0 sending 10 to 1, tag 201 (8 processes in ring)
>>        Process 0 sent to 1
>>        Process 0 decremented value: 9
>>        Process 0 decremented value: 8
>>        Process 0 decremented value: 7
>>        Process 0 decremented value: 6
>>        Process 0 decremented value: 5
>>        Process 0 decremented value: 4
>>        Process 0 decremented value: 3
>>        Process 0 decremented value: 2
>>        Process 0 decremented value: 1
>>        Process 0 decremented value: 0
>>        Process 0 exiting
>>        Process 1 exiting
>>        Process 2 exiting
>>        Process 3 exiting
>>        Process 4 exiting
>>        Process 5 exiting
>>        Process 6 exiting
>>        Process 7 exiting
>>        macmanes@macmanes:~/Downloads/openmpi-1.4/examples$
>>        /home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 connectivity_c
>>        Connectivity test on 8 processes PASSED.
>>        macmanes@macmanes:~/Downloads/openmpi-1.4/examples$
>>        /home/macmanes/apps/openmpi1.4/bin/mpirun -np 8 connectivity_c
>>        ..HANGS..NO OUTPUT
>>        this is maddening because ring_c works.. and connectivity_c
>>        worked the 1st time, but not the second... I did it 10 times,
>>        and it worked twice.. here is the TOP screenshot:
>>        
>> http://picasaweb.google.com/macmanes/DropBox?authkey=Gv1sRgCLKokNOVqo7BYw#5413382182027669394
>>        What is the difference between connectivity_c and ring_c? Under
>>        what circumstances should one fail and not the other...
>>        I'm off to the Linux forums to see about the Nehalem kernel issues..
>>        Matt
>>        On Wed, Dec 9, 2009 at 13:25, Gus Correa <g...@ldeo.columbia.edu
>>        <mailto:g...@ldeo.columbia.edu> <mailto:g...@ldeo.columbia.edu
>>        <mailto:g...@ldeo.columbia.edu>>> wrote:
>>           Hi Matthew
>>           There is no point in trying to troubleshoot MrBayes and ABySS
>>           if not even the OpenMPI test programs run properly.
>>           You must straighten them out first.
>>           **
>>           Suggestions:
>>           **
>>           A) While you are at OpenMPI, do yourself a favor,
>>           and install it from source on a separate directory.
>>           Who knows if the OpenMPI package distributed with Ubuntu
>>           works right on Nehalem?
>>           Better install OpenMPI yourself from source code.
>>           It is not a big deal, and may save you further trouble.
>>           Recipe:
>>           1) Install gfortran and g++ if you don't have them using apt-get.
>>           2) Put the OpenMPI tarball in, say /home/matt/downolads/openmpi
>>           3) Make another install directory *not in the system
>>        directory tree*.
>>           Something like "mkdir /home/matt/apps/openmpi-X.Y.Z/"
>>        (X.Y.Z=version)
>>           will work
>>           4) cd /home/matt/downolads/openmpi
>>           5) ./configure CC=gcc CXX=g++ F77=gfortran FC=gfortran  \
>>           --prefix=/home/matt/apps/openmpi-X.Y.Z
>>           (Use the prefix flag to install in the directory of item 3.)
>>           6) make
>>           7) make install
>>           8) At the bottom of your /home/matt/.bashrc or .profile file
>>           put these lines:
>>           export PATH=/home/matt/apps/openmpi-X.Y.Z/bin:${PATH}
>>           export MANPATH=/home/matt/apps/openmpi-X.Y.Z/share/man:`man -w`
>>           export
>>                  
>> LD_LIBRARY_PATH=home/matt/apps/openmpi-X.Y.Z/lib:${LD_LIBRARY_PATH}
>>           (If you use csh/tcsh use instead:
>>           setenv PATH /home/matt/apps/openmpi-X.Y.Z/bin:${PATH}
>>           etc)
>>           9) Logout and login again to freshen um the environment
>>        variables.
>>           10) Do "which mpicc"  to check that it is pointing to your newly
>>           installed OpenMPI.
>>           11) Recompile and rerun the OpenMPI test programs
>>           with 2, 4, 8, 16, .... processors.
>>           Use full path names to mpicc and to mpirun,
>>           if the change of PATH above doesn't work right.
>>           ********
>>           B) Nehalem is quite new hardware.
>>           I don't know if the Ubuntu kernel 2.6.31-16 fully supports all
>>           of Nehalem features, particularly hyperthreading, and NUMA,
>>           which are used by MPI programs.
>>           I am not the right person to give you advice about this.
>>           I googled out but couldn't find a clear information about
>>           minimal kernel age/requirements to have Nehalem fully supported.
>>           Some Nehalem owner in the list could come forward and tell.
>>           **
>>           C) On the top screenshot you sent me, please try it again
>>           (after you do item A) but type "f" and "j" to show the processors
>>           that are running each process.
>>           **
>>           D) Also, the screeshot shows 20GB of memory.
>>           This sounds not as a optimal memory for Nehalem,
>>           which tend to be 6GB, 12GB, 24GB, 48GB.
>>           Did you put together the system, or upgraded the memory yourself,
>>           of did you buy the computer as is?
>>           However, this should not break MPI anyway.
>>           **
>>           E) Answering your question:
>>           It is true that different flavors of MPI
>>           used to compile (mpicc) and run (mpiexec) a program would
>>        probably
>>           break right away, regardless of the number of processes.
>>           However, when it comes to different versions of the
>>           same MPI flavor (say OpenMPI 1.3.4 and OpenMPI 1.3.3)
>>           I am not sure it will break.
>>           I would guess it may run but not in a reliable way.
>>           Problems may appear as you stress the system with more cores,
>>        etc.
>>           But this is just a guess.
>>           **
>>           I hope this helps,
>>           Gus Correa
>>                  
>> ---------------------------------------------------------------------
>>           Gustavo Correa
>>           Lamont-Doherty Earth Observatory - Columbia University
>>           Palisades, NY, 10964-8000 - USA
>>                  
>> ---------------------------------------------------------------------
>>           Matthew MacManes wrote:
>>               Hi Gus,
>>               Interestingly the results for the connectivity_c test...
>>        works
>>               fine with -np <8. For -np >8 it works some of the time, other
>>               times it HANGS. I have got to believe that this is a big
>>        clue!!
>>               Also, when it hangs, sometimes I get the message "mpirun was
>>               unable to cleanly terminate the daemons on the nodes shown
>>               below" Note that NO nodes are shown below.   Once, I got
>>        -np 250
>>               to pass the connectivity test, but I was not able to
>>        replicate
>>               this reliable, so I'm not sure if it was a fluke, or
>>        what.  Here
>>               is a like to a screenshop of TOP when connectivity_c is hung
>>               with -np 14.. I see that 2 processes are only at 50% CPU
>>        usage..
>>               Hmmmm                 
>> http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink
>>        
>> <http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink>
>>                      
>> <http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink
>>        
>> <http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink>>
>>                      
>> <http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink
>>        
>> <http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink>
>>                      
>> <http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink
>>        
>> <http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink>>>
>>               The other tests, ring_c, hello_c, as well as the cxx
>>        versions of
>>               these guys with with all values of -np.
>>               Using -mca mpi-paffinity_alone 1 I get the same behavior.
>>               I agree that I am should worry about the mismatch between
>>        where
>>               the libraries are installed versus where I am telling my
>>               programs to look for them. Would this type of mismatch cause
>>               behavior like what I am seeing, i.e. working with  a small
>>               number of processors, but failing with larger?  It seems
>>        like a
>>               mismatch would have the same effect regardless of the
>>        number of
>>               processors used. Maybe I am mistaken. Anyway, to address
>>        this,
>>               which mpirun gives me /usr/local/bin/mpirun.. so to configure
>>               ./configure --with-mpi=/usr/local/bin/mpirun and to run
>>               /usr/local/bin/mpirun -np X ...  This should
>>               uname -a gives me: Linux macmanes 2.6.31-16-generic
>>        #52-Ubuntu
>>               SMP Thu Dec 3 22:07:16 UTC 2006 x86_64 GNU/Linux
>>               Matt
>>               On Dec 8, 2009, at 8:50 PM, Gus Correa wrote:
>>                   Hi Matthew
>>                   Please see comments/answers inline below.
>>                   Matthew MacManes wrote:
>>                       Hi Gus, Thanks for your ideas.. I have a few
>>        questions,
>>                       and will try to answer yours in hopes of solving
>>        this!!
>>                   A simple way to test OpenMPI on your system is to run the
>>                   test programs that come with the OpenMPI source code,
>>                   hello_c.c, connectivity_c.c, and ring_c.c:
>>                   http://www.open-mpi.org/
>>                   Get the tarball from the OpenMPI site, gzip and untar it,
>>                   and look for it in the "examples" directory.
>>                   Compile it with /your/path/to/openmpi/bin/mpicc hello_c.c
>>                   Run it with /your/path/to/openmpi/bin/mpiexec -np X a.out
>>                   using X = 2, 4, 8, 16, 32, 64, ...
>>                   This will tell if your OpenMPI is functional,
>>                   and if you can run on many Nehalem cores,
>>                   even with oversubscription perhaps.
>>                   It will also set the stage for further investigation
>>        of your
>>                   actual programs.
>>                       Should I worry about setting things like --num-cores
>>                       --bind-to-cores?  This, I think, gets at your
>>        questions
>>                       about processor affinity.. Am I right? I could not
>>                       exactly figure out the -mca mpi-paffinity_alone
>>        stuff...
>>                   I use the simple minded -mca mpi-paffinity_alone 1.
>>                   This is probably the easiest way to assign a process
>>        to a core.
>>                   There more complex  ways in OpenMPI, but I haven't tried.
>>                   Indeed, -mca mpi-paffinity_alone 1 does improve
>>        performance of
>>                   our programs here.
>>                   There is a chance that without it the 16 virtual cores of
>>                   your Nehalem get confused with more than 3 processes
>>                   (you reported that -np > 3 breaks).
>>                   Did you try adding just -mca mpi-paffinity_alone 1  to
>>                   your mpiexec command line?
>>                       1. Additional load: nope. nothing else, most of
>>        the time
>>                       not even firefox.
>>                   Good.
>>                   Turn off firefox, etc, to make it even better.
>>                   Ideally, use runlevel 3, no X, like a computer
>>        cluster node,
>>                   but this may not be required.
>>                       2. RAM: no problems apparent when monitoring through
>>                       TOP. Interesting, I did wonder about
>>        oversubscription,
>>                       so I tried the option --nooversubscription, but this
>>                       gave me an error mssage.
>>                   Oversubscription from your program would only happen if
>>                   you asked for more processes than available cores, i.e.,
>>                   -np > 8 (or "virtual" cores, in case of Nehalem
>>        hyperthreading,
>>                   -np > 16).
>>                   Since you have -np=4 there is no oversubscription,
>>                   unless you have other external load (e.g. Matlab, etc),
>>                   but you said you don't.
>>                   Yet another possibility would be if your program is
>>        threaded
>>                   (e.g. using OpenMP along with MPI), but considering
>>        what you
>>                   said about OpenMP I would guess the programs don't
>>        use it.
>>                   For instance, you launch the program with 4 MPI
>>        processes,
>>                   and each process decides to start, say, 8 OpenMP threads.
>>                   You end up with 32 threads and 8 (real) cores (or 16
>>                   hyperthreaded
>>                   ones on Nehalem).
>>                   What else does top say?
>>                   Any hog processes (memory- or CPU-wise)
>>                   besides your program processes?
>>                       3. I have not tried other MPI flavors.. Ive been
>>                       speaking to the authors of the programs, and they are
>>                       both using openMPI.                     I was not 
>> trying to convince you to use another MPI.
>>                   I use MPICH2 also, but OpenMPI reigns here.
>>                   The idea or trying it with MPICH2 was just to check
>>        whether
>>                   OpenMPI
>>                   is causing the problem, but I don't think it is.
>>                       4. I don't think that this is a problem, as I'm
>>                       specifying --with-mpi=/usr/bin/...  when I
>>        compile the
>>                       programs. Is there any other way to be sure that
>>        this is
>>                       not a problem?
>>                   Hmmm ....
>>                   I don't know about your Ubuntu (we have CentOS and
>>        Fedora on
>>                   various
>>                   machines).
>>                   However, most Linux distributions come with their MPI
>>        flavors,
>>                   and so do compilers, etc.
>>                   Often times they install these goodies in unexpected
>>        places,
>>                   and this has caused a lot of frustration.
>>                   There are tons of postings on this list that eventually
>>                   boiled down to mismatched versions of MPI in
>>        unexpected places.
>>                   The easy way is to use full path names to compile and
>>        to run.
>>                   Something like this:
>>                   /my/openmpi/bin/mpicc on your program configuration
>>        script),
>>                   and something like this
>>                   /my/openmpi/bin/mpiexec -np  ... bla, bla ...
>>                   when you submit the job.
>>                   You can check your version with "which mpicc", "which
>>        mpiexec",
>>                   and (perhaps using full path names) with
>>                   "ompi_info", "mpicc --showme", "mpiexec --help".
>>                       5. I had not been, and you could see some
>>        shuffling when
>>                       monitoring the load on specific processors. I
>>        have tried
>>                       to use --bind-to-cores to deal with this. I don't
>>                       understand how to use the -mca options you asked
>>        about.
>>                       6. I am using Ubuntu 9.10. gcc 4.4.1 and g++  4.4.1
>>                   I am afraid I won't be of help, because I don't have
>>        Nehalem.
>>                   However, I read about Nehalem requiring quite recent
>>        kernels
>>                   to get all of its features working right.
>>                   What is the output of "uname -a"?
>>                   This will tell the kernel version, etc.
>>                   Other list subscribers may give you a suggestion if
>>        you post the
>>                   information.
>>                       MyBayes is a for bayesian phylogenetics:
>>                        http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page
>>                       ABySS: is a program for assembly of DNA sequence
>>        data:
>>                       http://www.bcgsc.ca/platform/bioinfo/software/abyss
>>                   Thanks for the links!
>>                   I had found the MrBayes link.
>>                   I eventually found what your ABySS was about, but no
>>        links.
>>                   Amazing that it is about DNA/gene sequencing.
>>                   Our abyss here is the deep ocean ... :)
>>                   Abysmal difference!
>>                           Do the programs mix MPI (message passing) with
>>                           OpenMP (threads)?
>>                       Im honestly not sure what this means..
>>                   Some programs mix the two.
>>                   OpenMP only works in a shared memory environment
>>        (e.g. a single
>>                   computer like yours), whereas MPI can use both shared
>>        memory
>>                   and work across a network (e.g. in a cluster).
>>                   There are other differences too.
>>                   Unlikely that you have this hybrid type of parallel
>>        program,
>>                   otherwise there would be some reference to OpenMP
>>                   on the very program configuration files, program
>>                   documentation, etc.
>>                   Also, in general the configuration scripts of these
>>        hybrid
>>                   programs can turn on MPI only, or OpenMP only, or both,
>>                   depending on how you configure.
>>                   Even to compile with OpenMP you would need a proper
>>        compiler
>>                   flag, but that one might be hidden in a Makefile too,
>>        making
>>                   a bit hard to find. "grep -n mp Makefile" may give a
>>        clue.
>>                   Anything on the documentation that mentions threads
>>        or OpenMP?
>>                   FYI, here is OpenMP:
>>                   http://openmp.org/wp/
>>                       Thanks for all your help!
>>                    > Matt
>>                   Well, so far it didn't really help. :(
>>                   But let's hope to find a clue,
>>                   maybe with a little help of
>>                   our list subscriber friends.
>>                   Gus Correa
>>                          
>> ---------------------------------------------------------------------
>>                   Gustavo Correa
>>                   Lamont-Doherty Earth Observatory - Columbia University
>>                   Palisades, NY, 10964-8000 - USA
>>                          
>> ---------------------------------------------------------------------
>>                           Hi Matthew
>>                           More guesses/questions than anything else:
>>                           1) Is there any additional load on this machine?
>>                           We had problems like that (on different
>>        machines) when
>>                           users start listening to streaming video, doing
>>                           Matlab calculations,
>>                           etc, while the MPI programs are running.
>>                           This tends to oversubscribe the cores, and
>>        may lead
>>                           to crashes.
>>                           2) RAM:
>>                           Can you monitor the RAM usage through "top"?
>>                           (I presume you are on Linux.)
>>                           It may show unexpected memory leaks, if they
>>        exist.
>>                           On "top", type "1" (one) see all cores, type "f"
>>                           then "j"
>>                           to see the core number associated to each
>>        process.
>>                           3) Do the programs work right with other MPI
>>        flavors
>>                           (e.g. MPICH2)?
>>                           If not, then it is not OpenMPI's fault.
>>                           4) Any possibility that the MPI
>>        versions/flavors of
>>                           mpicc and
>>                           mpirun that you are using to compile and
>>        launch the
>>                           program are not the
>>                           same?
>>                           5) Are you setting processor affinity on mpiexec?
>>                           mpiexec -mca mpi_paffinity_alone 1 -np ...
>>        bla, bla ...
>>                           Context switching across the cores may also cause
>>                           trouble, I suppose.
>>                           6) Which Linux are you using (uname -a)?
>>                           On other mailing lists I read reports that only
>>                           quite recent kernels
>>                           support all the Intel Nehalem processor
>>        features well.
>>                           I don't have Nehalem, I can't help here,
>>                           but the information may be useful
>>                           for other list subscribers to help you.
>>                           ***
>>                           As for the programs, some programs require
>>        specific
>>                           setup,
>>                           (and even specific compilation) when the
>>        number of
>>                           MPI processes
>>                           vary.
>>                           It may help if you tell us a link to the
>>        program sites.
>>                           Baysian statistics is not totally out of our
>>        business,
>>                           but phylogenetic genetic trees is not really
>>        my league,
>>                           hence forgive me any bad guesses, please,
>>                           but would it need specific compilation or a
>>        different
>>                           set of input parameters to run correctly on a
>>        different
>>                           number of processors?
>>                           Do the programs mix MPI (message passing) with
>>                           OpenMP (threads)?
>>                           I found this MrBayes, which seems to do the
>>        above:
>>                           http://mrbayes.csit.fsu.edu/
>>                                  
>> http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page
>>                           As for the ABySS, what is it, where can it be
>>        found?
>>                           Doesn't look like a deep ocean circulation
>>        model, as
>>                           the name suggest.
>>                           My $0.02
>>                           Gus Correa
>>                              
>> ------------------------------------------------------------------------
>>                       _______________________________________________
>>                       users mailing list
>>                       us...@open-mpi.org <mailto:us...@open-mpi.org>
>>        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>                       http://www.open-mpi.org/mailman/listinfo.cgi/users
>>                   _______________________________________________
>>                   users mailing list
>>                   us...@open-mpi.org <mailto:us...@open-mpi.org>
>>        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>                   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>               _________________________________
>>                
>>               _______________________________________________
>>               users mailing list
>>               us...@open-mpi.org <mailto:us...@open-mpi.org>
>>        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>               http://www.open-mpi.org/mailman/listinfo.cgi/users
>>           _______________________________________________
>>           users mailing list
>>           us...@open-mpi.org <mailto:us...@open-mpi.org>
>>        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
>>           http://www.open-mpi.org/mailman/listinfo.cgi/users
>>        
>> ------------------------------------------------------------------------
>>        _______________________________________________
>>        users mailing list
>>        us...@open-mpi.org <mailto:us...@open-mpi.org>
>>        http://www.open-mpi.org/mailman/listinfo.cgi/users
>>    _______________________________________________
>>    users mailing list
>>    us...@open-mpi.org <mailto:us...@open-mpi.org>
>>    http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ------------------------------------------------------------------------
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_________________________________
Matthew MacManes
PhD Candidate
University of California- Berkeley
Museum of Vertebrate Zoology
Phone: 510-495-5833
Lab Website: http://ib.berkeley.edu/labs/lacey
Personal Website: http://macmanes.com/






Reply via email to