Is there any way to prevent the output of more than one node written to the same line. I tried setting --output-filename, which didnt help. For some reason only stdout was written to the files. Making it little bit hard to read close to a 6M output file.
-- Bharath On Thu, Feb 14, 2013 at 07:35:02AM -0800, Ralph Castain wrote: > Sounds like the orteds aren't reporting back to mpirun after launch. The > MPI_proctable observation just means that the procs didn't launch in those > cases where it is absent, which is something you already observed. > > Set "-mca plm_base_verbose 5" on your cmd line. You should see each orted > report back to mpirun after it launches. If not, then it is likely that > something is blocking it. > > You could also try updating to 1.6.3/4 in case there is some race condition > in 1.6.1, though we haven't heard of it to-date. > > > On Feb 14, 2013, at 7:21 AM, Bharath Ramesh <bram...@vt.edu> wrote: > > > On our cluster we are noticing intermediate job launch failure when using > > OpenMPI. We are currently using OpenMPI-1.6.1 on our cluster and it is > > integrated with Torque-4.1.3. It failes even for a simple MPI hello world > > applications. The issue is that orted gets launched on all the nodes but > > there are a bunch of nodes that dont launch the actual MPI application. > > There are no errors reported when the job gets killed because the walltime > > expires. Enabling --debug-daemons doesnt show any errors either. The only > > difference being that successful runs have MPI_proctable listed and for > > failures this is absent. Any help in debugging this issue is greatly > > appreciated. > > > > -- > > Bharath > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
smime.p7s
Description: S/MIME cryptographic signature