Hi Jeff Squyres Issue have resolved after resetting environment variables in the user script.
Thanks, Bharati Singh -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres (jsquyres) Sent: Monday, June 17, 2013 7:13 PM To: Open MPI Users Subject: Re: [OMPI users] lsb_launch failed: 0 I'm not an LSF expert, but this usually means that the Open MPI helper executable named "orted" was not able to be found on the remote nodes. Is your PATH set properly, both locally and remotely, such that the Open MPI executables can be found? On Jun 17, 2013, at 7:01 AM, "Singh, Bharati (GE Global Research, consultant)" <bharati.si...@ge.com> wrote: > Hi Team, > > Our users jobs are exiting with below error for random nodes. could you please help us to resolve this issue? > > [root@bng1grcdc200 output.228472]# cat user_script.stderr > [bng1grcdc181:08381] [[54933,0],0] ORTE_ERROR_LOG: The specified > application failed to start in file plm_lsf_module.c at line 308 > [bng1grcdc181:08381] lsb_launch failed: 0 > ---------------------------------------------------------------------- > ---- A daemon (pid unknown) died unexpectedly on signal 1 while > attempting to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed > shared libraries on the remote node. You may set your LD_LIBRARY_PATH > to have the location of the shared libraries on the remote nodes and > this will automatically be forwarded to the remote nodes. > ---------------------------------------------------------------------- > ---- > ---------------------------------------------------------------------- > ---- mpirun noticed that the job aborted, but has no info as to the > process that caused that situation. > ---------------------------------------------------------------------- > ---- > ---------------------------------------------------------------------- > ---- mpirun was unable to cleanly terminate the daemons on the nodes > shown below. Additional manual cleanup may be required - please refer > to the "orte-clean" tool for assistance. > ------------------------------------------------------------------------ -- > bng1grcdc172 - daemon did not report back when launched > bng1grcdc154 - daemon did not report back when launched > bng1grcdc198 - daemon did not report back when launched > bng1grcdc183 - daemon did not report back when launched > bng1grcdc187 - daemon did not report back when launched > bng1grcdc196 - daemon did not report back when launched > bng1grcdc153 - daemon did not report back when launched > bng1grcdc173 - daemon did not report back when launched > bng1grcdc193 - daemon did not report back when launched > bng1grcdc185 - daemon did not report back when launched > bng1grcdc176 - daemon did not report back when launched > bng1grcdc190 - daemon did not report back when launched > bng1grcdc194 - daemon did not report back when launched > bng1grcdc156 - daemon did not report back when launched > > > Thanks, > Bharati Singh > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users