Re: [OMPI users] Gamess with openmpi

Jeff Squyres Wed, 4 Mar 2009 15:31:36 -0500

Sorry for the delay in replying -- INBOX deluge makes me miss emailson the users list sometimes.

I'm unfortunately not familiar with gamess -- have you checked withtheir support lists or documentation?

Note that Open MPI's IB progression engine will spin hard to makeprogress for message passing. Specifically, if you have processesthat are "blocking" in message passing calls, those processes willactually be spinning trying to make progress (vs. actually blocking inthe kernel). So if you overload your hosts -- meaning that you runmore Open MPI jobs than there are cores -- you could well experiencedramatic slowdown in overall performance because every MPI job will becompeting for CPU cycles.



On Feb 24, 2009, at 4:57 AM, Thomas Exner wrote:

Dear all:

Because I am new to this list, I would like to introduce myself as
Thomas Exner and please excuse silly questions, because I am only achemist.
And now my problem, with which I am fiddling around for almost aweek: I
try to use gamess with openmpi on infiniband. There is a good
description on how to  compile it with mpi and it can be done, even if
it is not easy. But then on run time everything gets weird. The
specialty of gamess is that it runs twice as much mpi jobs than usedforthe computation. The second half is used as data server, requiringdata
but with very little cpu load. Each one of these data servers is
connected to a specific compute job. Therefore, these twocorresponding
jobs have to be run on the same node. On one node everything is fine
(2x4core machines in my case), because all the jobs are guarantied to
run on this node. If I try two nodes, at the beginning also everything
is fine. 8 compute jobs and 8 data server are running on each machine.
But after a short while, the entire set of processes (16) on the first
node start to accumulate CPU time, with nothing useful happening.  The
second node's processes go entirely to sleep. Is it possible that all
the compute jobs are for some reason been transfered to the firstnode?This would explain the load of 16 on the first and 0 on the secondnode,because 16 compute jobs (100 % cpu load) and 16 data servers (almost0%
load) are running, respectively. Strange thing is also that the same
version runs on gigabit and myrinet fine.

It would be great if somebody could help me on that. If you need more
information, I will be happy to share them with you.

Thanks very much.
Thomas


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] Gamess with openmpi

Reply via email to