Hi

I have been testing OpenMPI 1.2, and now 1.2.1, on several BProc-
based clusters, and I have found some problems/issues.  All my
clusters have standard ethernet interconnects, either 100Base/T or
Gigabit, on standard switches.

The clusters are all running Clustermatic 5 (BProc 4.x), and range
from 32-bit Athlon, to 32-bit Xeon, to 64-bit Opteron.  In all cases
the same problems occur, identically.  I attach here the results
from "ompi_info --all" and the config.log, for my latest build on
an Opteron cluster, using the Pathscale compilers.  I had exactly
the same problems when using the vanilla GNU compilers.

Now for a description of the problem:

When running an mpi code (cpi.c, from the standard mpi examples, also
attached), using the mpirun defaults (e.g. -byslot), with a single
process:

        sonoma:dgruner{134}> mpirun -n 1 ./cpip
        [n17:30019] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        pi is approximately 3.1415926544231341, Error is 0.0000000008333410
        wall clock time = 0.000199

However, if one tries to run more than one process, this bombs:

        sonoma:dgruner{134}> mpirun -n 2 ./cpip
        .
        .
        .
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        .
        . ad infinitum

If one uses de option "-bynode", things work:

        sonoma:dgruner{145}> mpirun -bynode -n 2 ./cpip
        [n17:30055] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        Process 1 on n21
        pi is approximately 3.1415926544231318, Error is 0.0000000008333387
        wall clock time = 0.010375


Note that there is always the message about "openpty failed, using pipes 
instead".

If I run more processes (on my 3-node cluster, with 2 cpus per node), the
openpty message appears repeatedly for the first node:

        sonoma:dgruner{146}> mpirun -bynode -n 6 ./cpip
        [n17:30061] odls_bproc: openpty failed, using pipes instead
        [n17:30061] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        Process 2 on n49
        Process 1 on n21
        Process 5 on n49
        Process 3 on n17
        Process 4 on n21
        pi is approximately 3.1415926544231239, Error is 0.0000000008333307
        wall clock time = 0.050332


Should I worry about the openpty failure?  I suspect that communications
may be slower this way.  Using the -byslot option always fails, so this
is a bug.  The same occurs for all the codes that I have tried, both simple
and complex.

Thanks for your attention to this.
Regards,
Daniel
-- 

Dr. Daniel Gruner                        dgru...@chem.utoronto.ca
Dept. of Chemistry                       daniel.gru...@utoronto.ca
University of Toronto                    phone:  (416)-978-8689
80 St. George Street                     fax:    (416)-978-5325
Toronto, ON  M5S 3H6, Canada             finger for PGP public key

Attachment: cpi.c.gz
Description: GNU Zip compressed data

Attachment: config.log.gz
Description: GNU Zip compressed data

Attachment: ompiinfo.gz
Description: GNU Zip compressed data

Reply via email to