us...@open-mpi.org
To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org
You can reach the person managing the list at
users-ow...@open-mpi.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."
Today's Topics:
1. Re: Installing OpenMPI on a solaris (Jeff Squyres (jsquyres))
----------------------------------------------------------------------
Message: 1
Date: Wed, 28 Jun 2006 08:56:36 -0400
From: "Jeff Squyres \(jsquyres\)" <jsquy...@cisco.com>
Subject: Re: [OMPI users] Installing OpenMPI on a solaris
To: "Open MPI Users" <us...@open-mpi.org>
Message-ID:
<c835b9c9cb0f1c4e9da48913c9e8f8afae9...@xmb-rtp-215.amer.cisco.com>
Content-Type: text/plain; charset="iso-8859-1"
Bummer! :-(
Just to be sure -- you had a clean config.cache file before you ran configure, right? (e.g., the file didn't exist -- just to be sure it didn't get potentially erroneous values from a previous run of configure) Also, FWIW, it's not necessary to specify --enable-ltdl-convenience; that should be automatic.
If you had a clean configure, we *suspect* that this might be due to alignment issues on Solaris 64 bit platforms, but thought that we might have had a pretty good handle on it in 1.1. Obviously we didn't solve everything. Bonk.
Did you get a corefile, perchance? If you could send a stack trace, that would be most helpful.
________________________________
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Eric Thibodeau
Sent: Tuesday, June 20, 2006 8:36 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Installing OpenMPI on a solaris
Hello Brian (and all),
Well, the joy was short lived. On a 12 CPU Enterprise machine and on a
4 CPU one, I seem to be able to start up to 4 processes. Above 4, I seem to
inevitably get BUS_ADRALN (Bus collisions?). Below are some traces of the
failling runs as well as a detailed (mpirun -d) of one of these situations and
ompi_info output. Obviously, don't hesitate to ask if more information is
requred.
Buid version: openmpi-1.1b5r10421
Config parameters:
Open MPI config.status 1.1b5
configured by ./configure, generated by GNU Autoconf 2.59,
with options \"'--cache-file=config.cache' 'CFLAGS=-mcpu=v9'
'CXXFLAGS=-mcpu=v9' 'FFLAGS=-mcpu=v9'
'--prefix=/export/lca/home/lca0/etudiants/ac38820/openmp
i_sun4u' --enable-ltdl-convenience\"
The traces:
sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $
~/openmpi_sun4u/bin/mpirun -np 10 mandelbrot-mpi 100 400 400
Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)
Failing at addr:2f4f04
*** End of error message ***
sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $
~/openmpi_sun4u/bin/mpirun -np 8 mandelbrot-mpi 100 400 400
Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)
Failing at addr:2b354c
*** End of error message ***
sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $
~/openmpi_sun4u/bin/mpirun -np 6 mandelbrot-mpi 100 400 400
Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)
Failing at addr:2b1ecc
*** End of error message ***
sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $
~/openmpi_sun4u/bin/mpirun -np 5 mandelbrot-mpi 100 400 400
Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)
Failing at addr:2b12cc
*** End of error message ***
sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $
~/openmpi_sun4u/bin/mpirun -np 4 mandelbrot-mpi 100 400 400
maxiter = 100, width = 400, height = 400
execution time in seconds = 1.48
Taper q pour quitter le programme, autrement, on fait un refresh
q
sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $
~/openmpi_sun4u/bin/mpirun -np 5 mandelbrot-mpi 100 400 400
Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)
Failing at addr:2b12cc
*** End of error message ***
I also got the same behaviour on a different machine (with the exact
same code base, $HOME is an NFS mount) and same hardware but limited to 4 CPUs.
The following is a debug run of such the failling execution:
sshd@enterprise ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $
~/openmpi_sun4u/bin/mpirun -d -v -np 5 mandelbrot-mpi 100 400 400
[enterprise:24786] [0,0,0] setting up session dir with
[enterprise:24786] universe default-universe
[enterprise:24786] user sshd
[enterprise:24786] host enterprise
[enterprise:24786] jobid 0
[enterprise:24786] procid 0
[enterprise:24786] procdir:
/tmp/openmpi-sessions-sshd@enterprise_0/default-universe/0/0
[enterprise:24786] jobdir:
/tmp/openmpi-sessions-sshd@enterprise_0/default-universe/0
[enterprise:24786] unidir:
/tmp/openmpi-sessions-sshd@enterprise_0/default-universe
[enterprise:24786] top: openmpi-sessions-sshd@enterprise_0
[enterprise:24786] tmp: /tmp
[enterprise:24786] [0,0,0] contact_file
/tmp/openmpi-sessions-sshd@enterprise_0/default-universe/universe-setup.txt
[enterprise:24786] [0,0,0] wrote setup file
[enterprise:24786] pls:rsh: local csh: 0, local bash: 0
[enterprise:24786] pls:rsh: assuming same remote shell as local shell
[enterprise:24786] pls:rsh: remote csh: 0, remote bash: 0
[enterprise:24786] pls:rsh: final template argv:
[enterprise:24786] pls:rsh: /usr/local/bin/ssh <template> ( ! [ -e ./.profile ] || . ./.profile; orted
--debug --bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename <template> --universe
sshd@enterprise:default-universe --nsreplica "0.0.0;tcp://10.45.117.37:40236" --gprreplica
"0.0.0;tcp://10.45.117.37:40236" --mpi-call-yield 0 )
[enterprise:24786] pls:rsh: launching on node localhost
[enterprise:24786] pls:rsh: oversubscribed -- setting
mpi_yield_when_idle to 1 (1 5)
[enterprise:24786] pls:rsh: localhost is a LOCAL node
[enterprise:24786] pls:rsh: reset PATH:
/export/lca/home/lca0/etudiants/ac38820/openmpi_sun4u/bin:/bin:/usr/local/bin:/usr/bin:/usr/sbin:/usr/ccs/bin:/usr/dt/bin:/usr/local/lam-mpi/7.1.1/bin:/export/lca/appl/Forte/SUNWspro/WS6U2/bin:/opt/sfw/bin:/usr/bin:/usr/ucb:/etc:/usr/local/bin:.
[enterprise:24786] pls:rsh: reset LD_LIBRARY_PATH:
/export/lca/home/lca0/etudiants/ac38820/openmpi_sun4u/lib:/export/lca/appl/Forte/SUNWspro/WS6U2/lib:/usr/local/lib:/usr/local/lam-mpi/7.1.1/lib:/opt/sfw/lib
[enterprise:24786] pls:rsh: changing to directory
/export/lca/home/lca0/etudiants/ac38820
[enterprise:24786] pls:rsh: executing: orted --debug --bootproxy 1 --name 0.0.1 --num_procs
2 --vpid_start 0 --nodename localhost --universe sshd@enterprise:default-universe --nsreplica
"0.0.0;tcp://10.45.117.37:40236" --gprreplica "0.0.0;tcp://10.45.117.37:40236"
--mpi-call-yield 1
[enterprise:24787] [0,0,1] setting up session dir with
[enterprise:24787] universe default-universe
[enterprise:24787] user sshd
[enterprise:24787] host localhost
[enterprise:24787] jobid 0
[enterprise:24787] procid 1
[enterprise:24787] procdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/0/1
[enterprise:24787] jobdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/0
[enterprise:24787] unidir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe
[enterprise:24787] top: openmpi-sessions-sshd@localhost_0
[enterprise:24787] tmp: /tmp
[enterprise:24789] [0,1,0] setting up session dir with
[enterprise:24789] universe default-universe
[enterprise:24789] user sshd
[enterprise:24789] host localhost
[enterprise:24789] jobid 1
[enterprise:24789] procid 0
[enterprise:24789] procdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/1/0
[enterprise:24789] jobdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/1
[enterprise:24789] unidir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe
[enterprise:24789] top: openmpi-sessions-sshd@localhost_0
[enterprise:24789] tmp: /tmp
[enterprise:24791] [0,1,1] setting up session dir with
[enterprise:24791] universe default-universe
[enterprise:24791] user sshd
[enterprise:24791] host localhost
[enterprise:24791] jobid 1
[enterprise:24791] procid 1
[enterprise:24791] procdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/1/1
[enterprise:24791] jobdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/1
[enterprise:24791] unidir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe
[enterprise:24791] top: openmpi-sessions-sshd@localhost_0
[enterprise:24791] tmp: /tmp
[enterprise:24793] [0,1,2] setting up session dir with
[enterprise:24793] universe default-universe
[enterprise:24793] user sshd
[enterprise:24793] host localhost
[enterprise:24793] jobid 1
[enterprise:24793] procid 2
[enterprise:24793] procdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/1/2
[enterprise:24793] jobdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/1
[enterprise:24793] unidir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe
[enterprise:24793] top: openmpi-sessions-sshd@localhost_0
[enterprise:24793] tmp: /tmp
[enterprise:24795] [0,1,3] setting up session dir with
[enterprise:24795] universe default-universe
[enterprise:24795] user sshd
[enterprise:24795] host localhost
[enterprise:24795] jobid 1
[enterprise:24795] procid 3
[enterprise:24795] procdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/1/3
[enterprise:24795] jobdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/1
[enterprise:24795] unidir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe
[enterprise:24795] top: openmpi-sessions-sshd@localhost_0
[enterprise:24795] tmp: /tmp
[enterprise:24797] [0,1,4] setting up session dir with
[enterprise:24797] universe default-universe
[enterprise:24797] user sshd
[enterprise:24797] host localhost
[enterprise:24797] jobid 1
[enterprise:24797] procid 4
[enterprise:24797] procdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/1/4
[enterprise:24797] jobdir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe/1
[enterprise:24797] unidir:
/tmp/openmpi-sessions-sshd@localhost_0/default-universe
[enterprise:24797] top: openmpi-sessions-sshd@localhost_0
[enterprise:24797] tmp: /tmp
[enterprise:24786] spawn: in job_state_callback(jobid = 1, state = 0x4)
[enterprise:24786] Info: Setting up debugger process table for
applications
MPIR_being_debugged = 0
MPIR_debug_gate = 0
MPIR_debug_state = 1
MPIR_acquired_pre_main = 0
MPIR_i_am_starter = 0
MPIR_proctable_size = 5
MPIR_proctable:
(i, host, exe, pid) = (0, localhost, mandelbrot-mpi, 24789)
(i, host, exe, pid) = (1, localhost, mandelbrot-mpi, 24791)
(i, host, exe, pid) = (2, localhost, mandelbrot-mpi, 24793)
(i, host, exe, pid) = (3, localhost, mandelbrot-mpi, 24795)
(i, host, exe, pid) = (4, localhost, mandelbrot-mpi, 24797)
[enterprise:24789] [0,1,0] ompi_mpi_init completed
[enterprise:24791] [0,1,1] ompi_mpi_init completed
[enterprise:24793] [0,1,2] ompi_mpi_init completed
[enterprise:24795] [0,1,3] ompi_mpi_init completed
[enterprise:24797] [0,1,4] ompi_mpi_init completed
Signal:10 info.si_errno:0(Error 0) si_code:1(BUS_ADRALN)
Failing at addr:2b12cc
*** End of error message ***
[enterprise:24787] sess_dir_finalize: found proc session dir empty -
deleting
[enterprise:24787] sess_dir_finalize: job session dir not empty -
leaving
[enterprise:24787] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_ABORTED)
[enterprise:24787] sess_dir_finalize: found job session dir empty -
deleting
[enterprise:24787] sess_dir_finalize: univ session dir not empty -
leaving
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: enterprise
PID: 24789
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: enterprise
PID: 24791
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: enterprise
PID: 24793
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: enterprise
PID: 24795
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: enterprise
PID: 24797
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: enterprise
PID: 24789
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: enterprise
PID: 24791
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: enterprise
PID: 24793
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: enterprise
PID: 24795
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: A process refused to die!
Host: enterprise
PID: 24797
This process may still be running and/or consuming resources.
--------------------------------------------------------------------------
[enterprise:24787] sess_dir_finalize: proc session dir not empty -
leaving
[enterprise:24787] sess_dir_finalize: proc session dir not empty -
leaving
[enterprise:24787] sess_dir_finalize: proc session dir not empty -
leaving
[enterprise:24787] sess_dir_finalize: proc session dir not empty -
leaving
[enterprise:24787] orted: job_state_callback(jobid = 1, state =
ORTE_PROC_STATE_TERMINATED)
[enterprise:24787] sess_dir_finalize: found proc session dir empty -
deleting
[enterprise:24787] sess_dir_finalize: found job session dir empty -
deleting
[enterprise:24787] sess_dir_finalize: found univ session dir empty -
deleting
[enterprise:24787] sess_dir_finalize: found top session dir empty -
deleting
ompi_info output:
sshd@enterprise ~ $ ~/openmpi_sun4u/bin/ompi_info
Open MPI: 1.1b5r10421
Open MPI SVN revision: r10421
Open RTE: 1.1b5r10421
Open RTE SVN revision: r10421
OPAL: 1.1b5r10421
OPAL SVN revision: r10421
Prefix: /export/lca/home/lca0/etudiants/ac38820/openmpi_sun4u
Configured architecture: sparc-sun-solaris2.8
Configured by: sshd
Configured on: Tue Jun 20 15:25:44 EDT 2006
Configure host: averoes
Built by: ac38820
Built on: Tue Jun 20 15:59:47 EDT 2006
Built host: averoes
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: no
Fortran90 bindings size: na
C compiler: gcc
C compiler absolute: /usr/local/bin/gcc
C++ compiler: g++
C++ compiler absolute: /usr/local/bin/g++
Fortran77 compiler: g77
Fortran77 compiler abs: /usr/local/bin/g77
Fortran90 compiler: f90
Fortran90 compiler abs: /export/lca/appl/Forte/SUNWspro/WS6U2/bin/f90
C profiling: yes
C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: no
C++ exceptions: no
Thread support: solaris (mpi: no, progress: no)
Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.1)
MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)
MCA timer: solaris (MCA v1.0, API v1.0, Component v1.1)
MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
MCA pml: dr (MCA v1.0, API v1.0, Component v1.1)
MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1)
MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1)
MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1)
MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1)
MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)
MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1)
MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)
MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)
MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)
MCA pls: fork (MCA v1.0, API v1.0, Component v1.1)
MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1)
MCA sds: env (MCA v1.0, API v1.0, Component v1.1)
MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1)
MCA sds: seed (MCA v1.0, API v1.0, Component v1.1)
MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1)
Le mardi 20 juin 2006 17:06, Eric Thibodeau a ?crit :
> Thanks for the pointer, it WORKS!! (yay)
>
> Le mardi 20 juin 2006 12:21, Brian Barrett a ?crit :
> > On Jun 19, 2006, at 12:15 PM, Eric Thibodeau wrote:
> >
> > > I checked the thread with the same title as this e-mail and tried
> > > compiling openmpi-1.1b4r10418 with:
> > >
> > > ./configure CFLAGS="-mv8plus" CXXFLAGS="-mv8plus" FFLAGS="-mv8plus"
> > > FCFLAGS="-mv8plus" --prefix=$HOME/openmpi-SUN-`uname -r` --enable-
> > > pretty-print-stacktrace
> > I put the incorrect flags in the error message - can you try again
with:
> >
> >
> > ./configure CFLAGS=-mcpu=v9 CXXFLAGS=-mcpu=v9 FFLAGS=-mcpu=v9
> > FCFLAGS=-mcpu=v9 --prefix=$HOME/openmpi-SUN-`uname -r` --enable-
> > pretty-print-stacktrace
> >
> >
> > and see if that helps? By the way, I'm not sure if Solaris has the
> > required support for the pretty-print stack trace feature. It likely
> > will print what signal caused the error, but will not actually print
> > the stack trace. It's enabled by default on Solaris, with this
> > limited functionality (the option exists for platforms that have
> > broken half-support for GNU libc's stack trace feature, and for users
> > that don't like us registering a signal handler to do the work).
> >
> > Brian
> >
> >
>
--
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517
-------------- next part --------------
HTML attachment scrubbed and removed
------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
End of users Digest, Vol 317, Issue 4
*************************************