Mpirun opens separate shell on each machine/node, so the "ulimit" will
not be available in new sheel. I think if you will add "ulimit -c
unlimited" to you default shell configuration file (~/.bashrc in BASH
case ant ~/.tcshrc in TCSH/CSH case) you will find your core files :)
Regards,
Pavel Shamis (Pasha)
Adams Samuel D Contr AFRL/HEDR wrote:
I set bash to have unlimited size core files like this:
$ ulimit -c unlimited
But, it was not dropping core files for some reason when I was running with
mpirun. Just to make sure it would do what I expected, I wrote a little C
program that was kind of like this
int ptr = 4;
fprintf(stderr,"bad! %s\n", (char*)ptr);
That would give a segmentation fault. It dropped a core file like you would
expect. Am I missing something?
Sam Adams
General Dynamics - Network Systems
Phone: 210.536.5945
-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres (jsquyres)
Sent: Saturday, April 08, 2006 6:25 AM
To: Open MPI Users
Subject: Re: [OMPI users] job running question
Some process is exiting on a segv -- are you getting any corefiles?
If not, can you increase your coredumpsize to unlimited? This should
let you get a corefile; can you send the backtrace from that corefile?
-----Original Message-----
From: users-boun...@open-mpi.org
[mailto:users-boun...@open-mpi.org] On Behalf Of Adams Samuel
D Contr AFRL/HEDR
Sent: Friday, April 07, 2006 11:53 AM
To: 'us...@open-mpi.org'
Subject: [OMPI users] job running question
We are trying to build a new cluster running OpenMPI. We
were previous
running LAM-MPI. To run jobs we would do the following:
$ lamboot lam-host-file
$ mpirun C program
I am not sure if this works more or less the same way with
ompi. We were
trying to run it like this:
$ [james.parker@Cent01 FORTRAN]$ mpirun --np 2 f_5x5 localhost
mpirun noticed that job rank 1 with PID 0 on node "localhost"
exited on
signal 11.
[Cent01.brooks.afmc.ds.af.mil:16124] ERROR: A daemon on node localhost
failed to start as expected.
[Cent01.brooks.afmc.ds.af.mil:16124] ERROR: There may be more
information
available from
[Cent01.brooks.afmc.ds.af.mil:16124] ERROR: the remote shell
(see above).
[Cent01.brooks.afmc.ds.af.mil:16124] The daemon received a signal 11.
1 additional process aborted (not shown)
[james.parker@Cent01 FORTRAN]$
We have ompi installed to /usr/local, and these are our environment
variables:
[james.parker@Cent01 FORTRAN]$ export
declare -x COLORTERM="gnome-terminal"
declare -x
DBUS_SESSION_BUS_ADDRESS="unix:abstract=/tmp/dbus-sfzFctmRFS"
declare -x DESKTOP_SESSION="default"
declare -x DISPLAY=":0.0"
declare -x GDMSESSION="default"
declare -x GNOME_DESKTOP_SESSION_ID="Default"
declare -x GNOME_KEYRING_SOCKET="/tmp/keyring-x8WQ1E/socket"
declare -x
GTK_RC_FILES="/etc/gtk/gtkrc:/home/BROOKS-2K/james.parker/.gtk
rc-1.2-gnome2"
declare -x G_BROKEN_FILENAMES="1"
declare -x HISTSIZE="1000"
declare -x HOME="/home/BROOKS-2K/james.parker"
declare -x HOSTNAME="Cent01"
declare -x INPUTRC="/etc/inputrc"
declare -x KDEDIR="/usr"
declare -x LANG="en_US.UTF-8"
declare -x LD_LIBRARY_PATH="/usr/local/lib:/usr/local/lib/openmpi"
declare -x LESSOPEN="|/usr/bin/lesspipe.sh %s"
declare -x LOGNAME="james.parker"
declare -x
LS_COLORS="no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=
40;33;01:cd=40
;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.ex
e=00;32:*.com=
00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;
31:*.tgz=00;31
:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z
=00;31:*.gz=00
;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31
:*.jpg=00;35:*
.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.t
if=00;35:"
declare -x MAIL="/var/spool/mail/james.parker"
declare -x
OLDPWD="/home/BROOKS-2K/james.parker/build/SuperLU_DIST_2.0"
declare -x
PATH="/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R
6/bin:/home/BR
OOKS-2K/james.parker/bin:/usr/local/bin"
declare -x
PERL5LIB="/usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-mul
ti:/usr/lib/pe
rl5/site_perl/5.8.5"
declare -x
PWD="/home/BROOKS-2K/james.parker/build/SuperLU_DIST_2.0/FORTRAN"
declare -x
SESSION_MANAGER="local/Cent01.brooks.afmc.ds.af.mil:/tmp/.ICE-
unix/14516"
declare -x SHELL="/bin/bash"
declare -x SHLVL="2"
declare -x SSH_AGENT_PID="14541"
declare -x SSH_ASKPASS="/usr/libexec/openssh/gnome-ssh-askpass"
declare -x SSH_AUTH_SOCK="/tmp/ssh-JUIxl14540/agent.14540"
declare -x TERM="xterm"
declare -x USER="james.parker"
declare -x WINDOWID="35651663"
declare -x XAUTHORITY="/home/BROOKS-2K/james.parker/.Xauthority"
[james.parker@Cent01 FORTRAN]$
Any ideas??
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users