Dear all: Because I am new to this list, I would like to introduce myself as Thomas Exner and please excuse silly questions, because I am only a chemist.
And now my problem, with which I am fiddling around for almost a week: I try to use gamess with openmpi on infiniband. There is a good description on how to compile it with mpi and it can be done, even if it is not easy. But then on run time everything gets weird. The specialty of gamess is that it runs twice as much mpi jobs than used for the computation. The second half is used as data server, requiring data but with very little cpu load. Each one of these data servers is connected to a specific compute job. Therefore, these two corresponding jobs have to be run on the same node. On one node everything is fine (2x4core machines in my case), because all the jobs are guarantied to run on this node. If I try two nodes, at the beginning also everything is fine. 8 compute jobs and 8 data server are running on each machine. But after a short while, the entire set of processes (16) on the first node start to accumulate CPU time, with nothing useful happening. The second node's processes go entirely to sleep. Is it possible that all the compute jobs are for some reason been transfered to the first node? This would explain the load of 16 on the first and 0 on the second node, because 16 compute jobs (100 % cpu load) and 16 data servers (almost 0% load) are running, respectively. Strange thing is also that the same version runs on gigabit and myrinet fine. It would be great if somebody could help me on that. If you need more information, I will be happy to share them with you. Thanks very much. Thomas