Hey Justin,

Please provide us your mca parameters (if any), these could be in a config file, environment variables or on the command line.

Thanks,

Galen

On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:

As far as the nightly builds go, I'm still seeing what I believe to be
this problem in both r10670 and r10652.  This is happening with
both Linux and OS X.  Below are the systems and ompi_info for the
newest revision 10670.

As an example of the error, when running HPL with Myrinet I get the
following error. Using tcp everything is fine and I see the results I'd
expect.
---------------------------------------------------------------------- ------ ||Ax-b||_oo / ( eps * ||A||_1 * N ) = 42820214496954887558164928727596662784.0000000 ...... FAILED ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 156556068835.2711182 ...... FAILED ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558 ...... FAILED ||Ax-b||_oo . . . . . . . . . . . . . . . . . = 272683853978565028754868928512.000000
||A||_oo . . . . . . . . . . . . . . . . . . . =        3822.884181
||A||_1  . . . . . . . . . . . . . . . . . . . =        3823.922627
||x||_oo . . . . . . . . . . . . . . . . . . . = 37037692483529688659798261760.000000 ||x||_1 . . . . . . . . . . . . . . . . . . . = 4102704048669982798475494948864.000000
===================================================

Finished      1 tests with the following results:
              0 tests completed and passed residual checks,
              1 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
---------------------------------------------------------------------- ------

Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64 PPC970FX, altivec supported GNU/Linux
jbronder@node41 ~ $ /usr/local/ompi-gnu-1.1.1a/bin/ompi_info
                Open MPI: 1.1.1a1r10670
   Open MPI SVN revision: r10670
                Open RTE: 1.1.1a1r10670
   Open RTE SVN revision: r10670
                    OPAL: 1.1.1a1r10670
       OPAL SVN revision: r10670
                  Prefix: /usr/local/ompi-gnu-1.1.1a
 Configured architecture: powerpc64-unknown-linux-gnu
           Configured by: root
           Configured on: Thu Jul  6 10:15:37 EDT 2006
          Configure host: node41
                Built by: root
                Built on: Thu Jul  6 10:28:14 EDT 2006
              Built host: node41
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: gfortran
Fortran77 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/ 4.1.0/gfortran
      Fortran90 compiler: gfortran
Fortran90 compiler abs: /usr/powerpc64-unknown-linux-gnu/gcc-bin/ 4.1.0/gfortran
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1.1)
           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.1)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1)
           MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
               MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
              MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1)
                  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.1)
MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1.1) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1.1) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA ras: tm (MCA v1.0, API v1.0, Component v1.1.1)
MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1.1) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.1) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1.1)
                MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
                MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pls: tm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.1)
MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1.1)
Configured as:
./configure \
    --prefix=$PREFIX \
    --enable-mpi-f77 \
    --enable-mpi-f90 \
    --enable-mpi-profile \
    --enable-mpi-cxx \
    --enable-pty-support \
    --enable-shared \
    --enable-smp-locks \
    --enable-io-romio \
    --with-tm=/usr/local/pbs \
    --without-xgrid \
    --without-slurm \
    --with-gm=/opt/gm

Darwin node90.meldrew.clusters.umaine.edu 8.6.0 Darwin Kernel Version 8.6.0: Tue Mar 7 16:58:48 PST 2006; root:xnu-792.6.70.obj~1/RELEASE_PPC Power Macintosh powerpc
node90:~/src/hpl jbronder$ /usr/local/ompi-xl/bin/ompi_info
                Open MPI: 1.1.1a1r10670
   Open MPI SVN revision: r10670
                Open RTE: 1.1.1a1r10670
   Open RTE SVN revision: r10670
                    OPAL: 1.1.1a1r10670
       OPAL SVN revision: r10670
                  Prefix: /usr/local/ompi-xl
 Configured architecture: powerpc-apple-darwin8.6.0
           Configured by:
           Configured on: Thu Jul  6 10:05:20 EDT 2006
          Configure host: node90.meldrew.clusters.umaine.edu
                Built by: root
                Built on: Thu Jul  6 10:37:40 EDT 2006
              Built host: node90.meldrew.clusters.umaine.edu
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (lower case)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: /opt/ibmcmp/vac/6.0/bin/xlc
     C compiler absolute: /opt/ibmcmp/vac/6.0/bin/xlc
            C++ compiler: /opt/ibmcmp/vacpp/6.0/bin/xlc++
   C++ compiler absolute: /opt/ibmcmp/vacpp/6.0/bin/xlc++
      Fortran77 compiler: /opt/ibmcmp/xlf/8.1/bin/xlf_r
  Fortran77 compiler abs: /opt/ibmcmp/xlf/8.1/bin/xlf_r
      Fortran90 compiler: /opt/ibmcmp/xlf/8.1/bin/xlf90_r
  Fortran90 compiler abs: /opt/ibmcmp/xlf/8.1/bin/xlf90_r
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
MCA memory: darwin (MCA v1.0, API v1.0, Component v1.1.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.1) MCA timer: darwin (MCA v1.0, API v1.0, Component v1.1.1)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
          MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1)
MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
               MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
              MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1)
                  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.1)
MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1.1) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1.1) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA ras: tm (MCA v1.0, API v1.0, Component v1.1.1)
MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1.1) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.1) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1.1)
                MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
                MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pls: tm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.1)
MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.1)
Configured as:
./configure \
    --prefix=$PREFIX \
    --with-tm=/usr/local/pbs/ \
    --with-gm=/opt/gm \
    --enable-static \
    --disable-cxx
On 7/3/06, George Bosilca <bosi...@cs.utk.edu> wrote:
Bernard,

A bug in the Open MPI GM driver was discovered after the 1.1 release.
A patch for the 1.1 is on the way. However, I don't know if it will
be available before the 1.1.1. Meanwhile, you can use the nightly
build version or a fresh check-out from the SVN repository. Both of
them have the GM bug corrected.

   Sorry for the troubles,
     george.

On Jul 3, 2006, at 12:58 PM, Borenstein, Bernard S wrote:

> I've built and sucessfully run the Nasa Overflow 2.0aa program with
> Openmpi 1.0.2. I'm running on an opteron linux cluster running SLES 9
> and GM 2.0.24. I built Openmpi 1.1 with the intel 9 compilers and
> try to
> run Overflow 2.0aa with myrinet, it get what looks like a data
> corruption error and the program dies quickly.
> There are no mpi errors at all.If I run using GIGE (--mca btl
> self,tcp),
> the program runs to competion correctly.  Here is my ompi_info
> output :
>
> bsb3227@mahler:~/openmpi_1.1/bin> ./ompi_info
>                 Open MPI: 1.1
>    Open MPI SVN revision: r10477
>                 Open RTE: 1.1
>    Open RTE SVN revision: r10477
>                     OPAL: 1.1
>        OPAL SVN revision: r10477
>                   Prefix: /home/bsb3227/openmpi_1.1
>  Configured architecture: x86_64-unknown-linux-gnu
>            Configured by: bsb3227
>            Configured on: Fri Jun 30 07:08:54 PDT 2006
>           Configure host: mahler
>                 Built by: bsb3227
>                 Built on: Fri Jun 30 07:54:46 PDT 2006
>               Built host: mahler
>               C bindings: yes
>             C++ bindings: yes
>       Fortran77 bindings: yes (all)
>       Fortran90 bindings: yes
>  Fortran90 bindings size: small
>               C compiler: icc
>      C compiler absolute: /opt/intel/cce/9.0.25/bin/icc
>             C++ compiler: icpc
>    C++ compiler absolute: /opt/intel/cce/9.0.25/bin/icpc
>       Fortran77 compiler: ifort
>   Fortran77 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
>       Fortran90 compiler: /opt/intel/fce/9.0.25/bin/ifort
>   Fortran90 compiler abs: /opt/intel/fce/9.0.25/bin/ifort
>              C profiling: yes
>            C++ profiling: yes
>      Fortran77 profiling: yes
>      Fortran90 profiling: yes
>           C++ exceptions: no
>           Thread support: posix (mpi: no, progress: no)
>   Internal debug support: no
>      MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>          libltdl support: yes
>               MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> v1.1)
>            MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
>            MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> v1.1)
> MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1)
>                MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
>            MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
> MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>                 MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
>                 MCA coll: hierarch (MCA v1.0, API v1.0, Component
> v1.1)
>                 MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
>                 MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
>                 MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
>                   MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
>                MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
>                MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1)
>                  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
>                  MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
>               MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
>                  MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
>                  MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
>                  MCA btl: gm (MCA v1.0, API v1.0, Component v1.1)
>                  MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
>                 MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
>                  MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
>                  MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
>                  MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
>                  MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
>                  MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
>                   MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
> MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
>                  MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
>                  MCA ras: dash_host (MCA v1.0, API v1.0, Component
> v1.1)
>                  MCA ras: hostfile (MCA v1.0, API v1.0, Component
> v1.1)
>                  MCA ras: localhost (MCA v1.0, API v1.0, Component
> v1.1)
>                  MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1)
>                  MCA ras: tm (MCA v1.0, API v1.0, Component v1.1)
>                  MCA rds: hostfile (MCA v1.0, API v1.0, Component
> v1.1)
> MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)
>                MCA rmaps: round_robin (MCA v1.0, API v1.0, Component
> v1.1)
>                 MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)
>                 MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)
>                  MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)
>                  MCA pls: fork (MCA v1.0, API v1.0, Component v1.1)
>                  MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1)
>                  MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1)
>                  MCA pls: tm (MCA v1.0, API v1.0, Component v1.1)
>                  MCA sds: env (MCA v1.0, API v1.0, Component v1.1)
>                  MCA sds: seed (MCA v1.0, API v1.0, Component v1.1)
>                  MCA sds: singleton (MCA v1.0, API v1.0, Component
> v1.1)
>                  MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1)
>                  MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1)
>
> Here is the ifconfig for one of the nodes :
>
> bsb3227@m045:~> /sbin/ifconfig
> eth0      Link encap:Ethernet  HWaddr 00:50:45:5D:CD:FE
>           inet addr:10.241.194.45  Bcast:10.241.195.255
> Mask:255.255.254.0
>           inet6 addr: fe80::250:45ff:fe5d:cdfe/64 Scope:Link
>           UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500
> Metric:1
>           RX packets:39913407 errors:0 dropped:0 overruns:0 frame:0
> TX packets:48794587 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:31847343907 (30371.9 Mb)  TX bytes:48231713866
> (45997.3 Mb)
>           Interrupt:19
>
> eth1      Link encap:Ethernet  HWaddr 00:50:45:5D:CD:FF
>           inet6 addr: fe80::250:45ff:fe5d:cdff/64 Scope:Link
>           UP BROADCAST MULTICAST  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>           Interrupt:19
>
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask: 255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:23141 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:23141 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:20145689 (19.2 Mb)  TX bytes:20145689 (19.2 Mb)
>
> I hope someone can give me some guidance on how to debug this problem.
> Thanx in advance for any help
> that can be provided.
>
> Bernie Borenstein
> The Boeing Company
> <config.log.gz>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

"Half of what I say is meaningless; but I say it so that the other
half may reach you"
                                   Kahlil Gibran


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to