Hmm, I didn't know those were segfault reports.  It could indeed be a
segfault if the code isn't exiting properly - but the code really is trying
to exit there with the "SCF failed to converge" error.  Thanks for the help!



Thanks,



-          Lee-Ping



From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Sunday, September 21, 2014 11:49 AM
To: Open MPI Users
Subject: Re: [OMPI users] Process is hanging



Thanks - I asked because the output you sent shows a bunch of segfault
reports. I'll investigate the non-zero status question



On Sep 21, 2014, at 10:02 AM, Lee-Ping Wang <leep...@stanford.edu> wrote:





My program isn't segfaulting - it's returning a non-zero status and then
existing. 



Thanks,



-          Lee-Ping



From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Sunday, September 21, 2014 8:54 AM
To: Open MPI Users
Subject: Re: [OMPI users] Process is hanging



Just to be clear: is your program returning a non-zero status and then
exiting, or is it segfaulting?





On Sep 21, 2014, at 8:22 AM, Lee-Ping Wang < <mailto:leep...@stanford.edu>
leep...@stanford.edu> wrote:






I'm using version 1.8.1.



Thanks,



-          Lee-Ping



From: users [ <mailto:users-boun...@open-mpi.org>
mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Sunday, September 21, 2014 6:56 AM
To: Open MPI Users
Subject: Re: [OMPI users] Process is hanging



Can you please tell us what version of OMPI you are using?





On Sep 21, 2014, at 6:08 AM, Lee-Ping Wang < <mailto:leep...@stanford.edu>
leep...@stanford.edu> wrote:







Hi there,



I'm running into an issue where mpirun isn't terminating when my executable
has a nonzero exit status - instead it's hanging indefinitely.  I'm
attaching my process tree, the error message from the application, and the
messages printed to stderr.   Please let me know what I can do.



Thanks,



-          Lee-Ping



=== Process Tree ===

leeping@vsp-compute-13:~$ ps xjf

PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND

31969 31977 31969 31969 ?           -1 S    48618   0:00 sshd: leeping@pts/1

31977 31978 31978 31978 pts/1    32038 Ss   48618   0:00  \_ -bash

31978 32038 32038 31978 pts/1    32038 R+   48618   0:00      \_ ps xjf

23667 29307 29307 29307 ?           -1 Ss   48618   0:00 /bin/bash
/home/leeping/temp/leeping-workers/10276/worker1.sh

29307 29308 29307 29307 ?           -1 S    48618   0:00  \_ /bin/bash
/home/leeping/temp/leeping-workers/10276/worker2.sh

29308 29425 29307 29307 ?           -1 S    48618   0:00      \_
./work_queue_worker -d all --cores 6 -t 86400s localhost 9876

29425 26245 26245 29307 ?           -1 S    48618   0:00      |   \_ sh -c
optimize-geometry.py initial.xyz --method b3lyp --basis "6-31g(d)" --charge
0 --mult 1 &> optimize.log

26245 26246 26245 29307 ?           -1 Sl   48618   0:01      |       \_
/home/leeping/local/bin/python /home/leeping/local/bin/optimize-geometry.py
initial.xyz --method b3lyp --basis 6-31g(d) --charge 0 --mult 1

26246 27834 26245 29307 ?           -1 S    48618   0:00      |           \_
/bin/sh -c qchem42 -np 6 -save optimize.in optimize.out optimize.d 2>
optimize.err

27834 27835 26245 29307 ?           -1 S    48618   0:00      |
\_ /bin/bash /home/leeping/opt/bin/qchem42 -np 6 -save optimize.in
optimize.out optimize.d

27835 27897 26245 29307 ?           -1 S    48618   0:00      |
\_ /bin/csh -f /opt/scratch/leeping/opt/qchem-4.2/bin/qchem -np 6 -nt 1
-save optimize.in optimize.out optimize.d

27897 27921 26245 29307 ?           -1 S    48618   0:00      |
\_ /bin/csh -f /opt/scratch/leeping/opt/qchem-4.2/bin/parallel.csh
optimize.in 6 0 ./optimize.d/ 27897

27921 27926 26245 29307 ?           -1 Sl   48618   0:00      |
\_ /opt/scratch/leeping/opt/qchem-4.2/ext-libs/openmpi/bin/mpirun -np 6
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe .optimize.in.27897.qcin.1
./optimize.d/



=== Application Error Message ===

100    -843.2762335150      5.69E-08  00000    Convergence failure



Q-Chem fatal error occurred in module
/home/leeping/src/qchem/scfman/scfman.C, line 4377:



SCF failed to converge



Sat Sep 20 23:57:37 2014



=== Standard error ===


<mailto:leeping@vsp-compute-13:/opt/scratch/leeping/worker-48618-29425/t.62$
> leeping@vsp-compute-13:/opt/scratch/leeping/worker-48618-29425/t.62$ cat
optimize.err

[vsp-compute-13:27929] *** Process received signal ***

[vsp-compute-13:27929] Signal: Aborted (6)

[vsp-compute-13:27929] Signal code:  (-6)

[vsp-compute-13:27932] *** Process received signal ***

[vsp-compute-13:27932] Signal: Aborted (6)

[vsp-compute-13:27932] Signal code:  (-6)

[vsp-compute-13:27934] *** Process received signal ***

[vsp-compute-13:27934] Signal: Aborted (6)

[vsp-compute-13:27934] Signal code:  (-6)

[vsp-compute-13:27928] *** Process received signal ***

[vsp-compute-13:27928] Signal: Aborted (6)

[vsp-compute-13:27928] Signal code:  (-6)

[vsp-compute-13:27936] *** Process received signal ***

[vsp-compute-13:27936] Signal: Aborted (6)

[vsp-compute-13:27936] Signal code:  (-6)

[vsp-compute-13:27930] *** Process received signal ***

[vsp-compute-13:27930] Signal: Aborted (6)

[vsp-compute-13:27930] Signal code:  (-6)

[vsp-compute-13:27932] [ 0] /lib64/libpthread.so.0[0x3464c0eb70]

[vsp-compute-13:27932] [ 1] [vsp-compute-13:27928] [ 0]
/lib64/libpthread.so.0[0x3464c0eb70]

[vsp-compute-13:27928] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3464430265]

[vsp-compute-13:27928] [ 2] [vsp-compute-13:27934] [ 0]
/lib64/libpthread.so.0[0x3464c0eb70]

[vsp-compute-13:27934] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3464430265]

[vsp-compute-13:27934] [ 2] [vsp-compute-13:27929] [ 0]
/lib64/libpthread.so.0[0x3464c0eb70]

[vsp-compute-13:27929] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3464430265]

[vsp-compute-13:27929] [ 2] /lib64/libc.so.6(gsignal+0x35)[0x3464430265]

[vsp-compute-13:27932] [ 2] /lib64/libc.so.6(abort+0x110)[0x3464431d10]

[vsp-compute-13:27932] [ 3] /lib64/libc.so.6(abort+0x110)[0x3464431d10]

[vsp-compute-13:27928] [ 3]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0xc304ca6]

[vsp-compute-13:27928] [ 4]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x41a0cf5]

[vsp-compute-13:27928] [ 5] /lib64/libc.so.6(abort+0x110)[0x3464431d10]

[vsp-compute-13:27934] [ 3]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0xc304ca6]

[vsp-compute-13:27934] [ 4]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x41a0cf5]

[vsp-compute-13:27934] [ 5] /lib64/libc.so.6(abort+0x110)[0x3464431d10]

[vsp-compute-13:27929] [ 3]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0xc304ca6]

[vsp-compute-13:27929] [ 4]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x41a0cf5]

[vsp-compute-13:27929] [ 5]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0xc304ca6]

[vsp-compute-13:27932] [ 4]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x41a0cf5]

[vsp-compute-13:27932] [ 5]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x414a06e]

[vsp-compute-13:27934] [ 6]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x414a06e]

[vsp-compute-13:27929] [ 6]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x463392]

[vsp-compute-13:27929] [ 7]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x414a06e]

[vsp-compute-13:27928] [ 6]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x463392]

[vsp-compute-13:27928] [ 7]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x414a06e]

[vsp-compute-13:27932] [ 6]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x463392]

[vsp-compute-13:27932] [ 7]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x463392]

[vsp-compute-13:27934] [ 7]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45cdb0]

[vsp-compute-13:27934] [ 8]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45cdb0]

[vsp-compute-13:27932] [ 8]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45cdb0]

[vsp-compute-13:27929] [ 8]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45cdb0]

[vsp-compute-13:27928] [ 8]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45bfb6]

[vsp-compute-13:27928] [ 9]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45bfb6]

[vsp-compute-13:27932] [ 9]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x346441d994]

[vsp-compute-13:27932] [10]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45bfb6]

[vsp-compute-13:27929] [ 9]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x346441d994]

[vsp-compute-13:27929] [10]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x346441d994]

[vsp-compute-13:27928] [10]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45bfb6]

[vsp-compute-13:27934] [ 9]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x346441d994]

[vsp-compute-13:27934] [10]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x43e529]

[vsp-compute-13:27928] *** End of error message ***

/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x43e529]

[vsp-compute-13:27929] *** End of error message ***

/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x43e529]

[vsp-compute-13:27932] *** End of error message ***

/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x43e529]

[vsp-compute-13:27934] *** End of error message ***

[vsp-compute-13:27936] [ 0] /lib64/libpthread.so.0[0x3464c0eb70]

[vsp-compute-13:27936] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3464430265]

[vsp-compute-13:27936] [ 2] /lib64/libc.so.6(abort+0x110)[0x3464431d10]

[vsp-compute-13:27936] [ 3]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0xc304ca6]

[vsp-compute-13:27936] [ 4]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x41a0cf5]

[vsp-compute-13:27936] [ 5]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x414a06e]

[vsp-compute-13:27936] [ 6]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x463392]

[vsp-compute-13:27936] [ 7]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45cdb0]

[vsp-compute-13:27936] [ 8]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45bfb6]

[vsp-compute-13:27936] [ 9]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x346441d994]

[vsp-compute-13:27936] [10]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x43e529]

[vsp-compute-13:27936] *** End of error message ***

[vsp-compute-13:27930] [ 0] /lib64/libpthread.so.0[0x3464c0eb70]

[vsp-compute-13:27930] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3464430265]

[vsp-compute-13:27930] [ 2] /lib64/libc.so.6(abort+0x110)[0x3464431d10]

[vsp-compute-13:27930] [ 3]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0xc304ca6]

[vsp-compute-13:27930] [ 4]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x41a0cf5]

[vsp-compute-13:27930] [ 5]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x414a06e]

[vsp-compute-13:27930] [ 6]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x463392]

[vsp-compute-13:27930] [ 7]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45cdb0]

[vsp-compute-13:27930] [ 8]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x45bfb6]

[vsp-compute-13:27930] [ 9]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x346441d994]

[vsp-compute-13:27930] [10]
/opt/scratch/leeping/opt/qchem-4.2/exe/qcprog.exe[0x43e529]

[vsp-compute-13:27930] *** End of error message ***

_______________________________________________
users mailing list
 <mailto:us...@open-mpi.org> us...@open-mpi.org
Subscription:  <http://www.open-mpi.org/mailman/listinfo.cgi/users>
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
<http://www.open-mpi.org/community/lists/users/2014/09/25365.php>
http://www.open-mpi.org/community/lists/users/2014/09/25365.php



_______________________________________________
users mailing list
 <mailto:us...@open-mpi.org> us...@open-mpi.org
Subscription:  <http://www.open-mpi.org/mailman/listinfo.cgi/users>
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
<http://www.open-mpi.org/community/lists/users/2014/09/25367.php>
http://www.open-mpi.org/community/lists/users/2014/09/25367.php



_______________________________________________
users mailing list
 <mailto:us...@open-mpi.org> us...@open-mpi.org
Subscription:  <http://www.open-mpi.org/mailman/listinfo.cgi/users>
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
<http://www.open-mpi.org/community/lists/users/2014/09/25369.php>
http://www.open-mpi.org/community/lists/users/2014/09/25369.php



Reply via email to