Hi again,

Somehow the verbose flag (-v) did not work for me. I tried --debug-daemon and got:

[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile /home/mpiuser/.mpi_hostfile ./test/mpihello
Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a
<stuck here>

Somehow the program got stuck when checking on hosts. The secure log on hp430a showed that mpiuser logged in just fine:

tail /var/log/secure
Jun 7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from port 34037 ssh2 Jun 7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session opened for user mpiuser by (uid=0)

Any idea where/how/what to process/check?



On 6/7/12 4:38 PM, Duke wrote:
Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:
Hello Duke,
Welcome to the forum.
The way openmpi schedules by default is to fill all the slots in a host, before moving on to next host.
Check this link for some info:

Thanks for quick answer. I checked the FAQ, and tried with processes more than 2, but somehow it got stalled:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile /home/mpiuser/.mpi_hostfile ./test/mpihello
^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b ./test/mpihello

My configuration must be wrong somewhere. Anyidea how I can check the system?



On Thu, Jun 7, 2012 at 2:11 AM, Duke <duke.li...@gmx.com <mailto:duke.li...@gmx.com>> wrote:

    Hi folks,

    Please be gentle to the newest member of openMPI, I am totally
    new to this field. I just built a test cluster with 3 boxes on
    Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I wanted
    to test how the cluster works but I cant figure out what was/is
    happening. On my master node, I have the hostfile:

    [mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
    # The Hostfile for Open MPI
    fantomfs40a slots=2
    hp430a slots=4 max-slots=4
    hp430b slots=4 max-slots=4

    To test, I used the following c code:

    [mpiuser@fantomfs40a ~]$ cat test/mpihello.c
    /* program hello */
    /* Adapted from mpihello.f by drs */

    #include <mpi.h>
    #include <stdio.h>

    int main(int argc, char **argv)
     int *buf, i, rank, nints, len;
     char hostname[256];

     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
     printf("Hello world!  I am process number: %d on host %s\n",
    rank, hostname);
     return 0;

    and then compiled and ran:

    [mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
    [mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
    /home/mpiuser/.mpi_hostfile ./test/mpihello
    Hello world!  I am process number: 0 on host fantomfs40a
    Hello world!  I am process number: 1 on host fantomfs40a

    Unfortunately the result did not show what I wanted. I expected
    to see somethign like:

    Hello world!  I am process number: 0 on host hp430a
    Hello world!  I am process number: 1 on host hp430b

    Anybody has any idea what I am doing wrong?

    Thank you in advance,


    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>

users mailing list

users mailing list

Reply via email to