Hello,
My recent job started normally but after a few hours of running died with the following message: -------------------------------------------------------------------------- A daemon (pid 19390) died unexpectedly with status 137 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. The scheduling script is below: #$ -S /bin/bash #$ -cwd #$ -N SC3blastx_64-96thr #$ -pe openmpi* 64-96 #$ -l h_rt=24:00:00,vf=3G #$ -j y #$ -M yaxi...@gmail.com #$ -m eas # # Load the appropriate module files # Should be loaded already #$ -V mpirun -np $NSLOTS blastx -query $UABGRID_SCRATCH/SC/AdQ30/fasta/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.fasta -db nr -out $UABGRID_SCRATCH/SC/blastx/SC/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.out -evalue 0.001 -max_intron_length 100000 -outfmt 5 -num_alignments 20 -lcase_masking -num_threads $NSLOTS What caused this termination? It does not seem scheduling problem as the program run several hours with 96 threads. My $LD_LIBRARY_PATH does have /share/apps/openmpi/1.6.4-gcc/lib entry, so how else should I modify it? Vladimir