Which RPI were you using in 6.5.9? I ask because LAM could only have one RPI compiled into it back in the 6.x series; only in the 7.x series did we debut the ability to choose your RPI at run-time.
I'm guessing that you should be defaulting to usysv in 7.0.6, which, since it uses shared memory for messages on the same node, *may* account for speed differences between your 6.x and 7.x runs (e.g., if you were using the tcp RPI in the 6.x series) and therefore expose timing problems in your code.
The usysv RPI uses spin locks for on-node communication, so it should spin (and consume all the CPU) when it's waiting for on-node communication. But if you're blocking waiting for off-node communication, you won't see this spinning behavior.
Can you attach a debugger to any of the processes and see what they are doing?
On Jan 13, 2005, at 11:36 AM, Yu Chen wrote:
Hello,
After installation of OSCAR 4 on RH-EL-AS-3 cluster, one of my major mpi program is not running right. Here is the detail, thanks in advance for any help:
In short, the program will just sit there, waiting and waiting, but doing nothing, since normally it should gives out a lot of outputs.
In detail, we have a 28 nodes cluster including master node, each have 2 CPUs
Originally, I was running LAM-6.5.9 on Redhat 7.2, using PGI FORTRAN compiler and GNU C compiler. The command used to run is:
"mpirun -O -x CYANALIB c0,1,2,3,4,5,6,7,8,9,10,11,12 My_Program"
It ran fine, when run "gstat -a -1", I would see 6 nodes running at about 100% CPU time, since each had two copies running.
Now, I am using OSCAR 4(LAM-7.0.6) on RH-EL-AS-3 with all GNU compilers(C and FORTRAN), I recompiled my program BTW. Now with the same command, it runs, then just sits there, doing nothing. And from "gstat -a -1", it only shows 6 nodes running at about 50% CPU time, which seems like only one copy running on each node. The "mpitask" shows everything running.
Anyone's got any idea?
Regards Chen
=========================================== Yu Chen Howard Hughes Medical Institute Chemistry Building, Rm 182 University of Maryland at Baltimore County 1000 Hilltop Circle Baltimore, MD 21250
phone: (410)455-6347 (primary) (410)455-2718 (secondary) fax: (410)455-1174 email: [EMAIL PROTECTED] =========================================== _______________________________________________ This list is archived at http://www.lam-mpi.org/MailArchives/lam/
-- {+} Jeff Squyres {+} [EMAIL PROTECTED] {+} http://www.lam-mpi.org/
------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Oscar-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-users
