Thanks Jeff for your reply, you are always so helpful.

It's hard to say without more detail about your application; this could simply be the communication pattern of your application, that it causes blocking and makes processes wait for message passing to complete, etc.

But that program worked in provious setup, and it never got changed (only difference is the different FORTRAN compiler, PGI vs GNU)



Which RPI were you using in 6.5.9? I ask because LAM could only have one RPI compiled into it back in the 6.x series; only in the 7.x series did we debut the ability to choose your RPI at run-time.

I was using "usysv" on 6.5.9

I'm guessing that you should be defaulting to usysv in 7.0.6, which, since it uses shared memory for messages on the same node, *may* account for speed differences between your 6.x and 7.x runs (e.g., if you were using the tcp RPI in the 6.x series) and therefore expose timing problems in your c

I used all default in 7.0.6 in OSCAR, so should be usysv too.


The usysv RPI uses spin locks for on-node communication, so it should spin (and consume all the CPU) when it's waiting for on-node communication. But if you're blocking waiting for off-node communication, you won't see this spinning behavior.


Can you attach a debugger to any of the processes and see what they are doing?


I really don't know how to do it, could you help me with this.

And I forgot to mention, I succesfully run the following hello-world.c program:

++++++++++++++++++++++++++++++++++++++++++++++++++++
#include <stdio.h>
#include "mpi.h"

int
main(int argc, char **argv) {

  int rank;
  char msg[20];

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  if (rank==0) {
    printf("I am the master.  I am sending the message.\n\n");
    strcpy(msg,"Hello World!");
    MPI_Bcast(msg, 13, MPI_CHAR, rank, MPI_COMM_WORLD);
  } else {
    MPI_Bcast(msg, 13, MPI_CHAR, 0, MPI_COMM_WORLD);
    printf("I am the slave.  I am receiving the message.\n");
    printf("The message is: %s\n", msg);
  }

  MPI_Finalize();
}
+++++++++++++++++++++++++++++++++++++++++++++++++++++

Cheers,
Chen



On Jan 13, 2005, at 11:36 AM, Yu Chen wrote:

Hello,

After installation of OSCAR 4 on RH-EL-AS-3 cluster, one of my major mpi program is not running right. Here is the detail, thanks in advance for any help:

In short, the program will just sit there, waiting and waiting, but doing nothing, since normally it should gives out a lot of outputs.

In detail, we have a 28 nodes cluster including master node, each have 2 CPUs

Originally, I was running LAM-6.5.9 on Redhat 7.2, using PGI FORTRAN compiler and GNU C compiler. The command used to run is:
"mpirun -O -x CYANALIB c0,1,2,3,4,5,6,7,8,9,10,11,12 My_Program"
It ran fine, when run "gstat -a -1", I would see 6 nodes running at about 100% CPU time, since each had two copies running.


Now, I am using OSCAR 4(LAM-7.0.6) on RH-EL-AS-3 with all GNU compilers(C and FORTRAN), I recompiled my program BTW. Now with the same command, it runs, then just sits there, doing nothing. And from "gstat -a -1", it only shows 6 nodes running at about 50% CPU time, which seems like only one copy running on each node. The "mpitask" shows everything running.

Anyone's got any idea?

Regards
Chen

===========================================
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250

phone:  (410)455-6347 (primary)
        (410)455-2718 (secondary)
fax:    (410)455-1174
email:  [EMAIL PROTECTED]
===========================================
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/





===========================================
Yu Chen
Howard Hughes Medical Institute
Chemistry Building, Rm 182
University of Maryland at Baltimore County
1000 Hilltop Circle
Baltimore, MD 21250

phone:  (410)455-6347 (primary)
        (410)455-2718 (secondary)
fax:    (410)455-1174
email:  [EMAIL PROTECTED]
===========================================


------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Oscar-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to