Thank you for keeping helping me on this one. On Thu, Apr 11, 2013 at 08:19:07PM +0200, Reuti wrote: > I would say: "*Only* when a job fails, strace shows that mpirun used > "qrsh -inherit ...". If it's local, there should only be forks in the > process lisiting for a running job. If it's making a local "qrsh > -inherit ..." something is wrong with Open MPI detection on the > hostname. You're right. Do you know how I could make Open MPI tell me more about what it is doing? I already use the mpi_param_check, mpi_show_handle_leaks, mpi_show_mca_params, mpi_keep_peer_hostnames, mpi_abort_print_stack, orte_debug, orte_debug_verbose, orte_debug_daemons, ras_gridengine_debug, ras_gridengine_verbose, ras_gridengine_show_jobid, and ras_base_verbose MCA parameters, but nothing useful appears. > Does the script specify by accident a dedicated hostlist or refer to a > self-assembled hostfile (which would contradict the granted machine)? I don't think so. Anyway it's failing in a random way without changing any parameter.
> No "-nolocal" specified? No. -- Bernard Massot _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
