Re: [OMPI users] very bad parallel scaling of vasp using openmpi

Jeff Squyres Mon, 17 Aug 2009 21:24:00 -0400

You might want to run some performance testing of you TCP stacks andthe switch -- use a non-MPI application such as NetPIPE (or others --google around) and see what kind of throughput you get. Try itbetween individual server peers and then try running it simultaneouslybetween a bunch of peers and see if the results are different, etc.


On Aug 17, 2009, at 5:51 PM, Craig Plaisance wrote:

Hi - I have compiled vasp 4.6.34 using the Intel fortran compiler 11.1
with openmpi 1.3.3 on a cluster of 104 nodes running Rocks 5.2 withtwoquad core opterons connected by a Gbit ethernet. Running inparallel onone node (8 cores) runs very well, faster than any other cluster Ihave
run it on.  However, running on 2 nodes in parallel only improves the
performance by 10% over the one node case while running on 4 and 8nodesyields no improvement over the two node case. Furthermore, whenrunningmultiple (3-4) jobs simultaneously, the performance decreases byaround
50% compared to running only a single job on the entire cluster.  The
nodes are connected by a Dell Powerconnect 6248 managed switch.  I get
the same performance with mpich2, so I don't think it is a problem
specific to openmpi.  Other vasp users have reported very good scaling
up to 4 nodes on a similar cluster, so I don't think the problem isvaspeither. Could something be wrong with the way mpi is configured toworkwith the switch? Or the operating system is not configured to workwiththe switch properly? Or the switch itself needs to be configured?Thanks!
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
jsquy...@cisco.com

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

Reply via email to