Thank you for keeping helping me on this one.

On Thu, Apr 11, 2013 at 08:19:07PM +0200, Reuti wrote:
> I would say: "*Only* when a job fails, strace shows that mpirun used
> "qrsh -inherit ...". If it's local, there should only be forks in the
> process lisiting for a running job. If it's making a local  "qrsh
> -inherit ..." something is wrong with Open MPI detection on the
> hostname.
You're right. Do you know how I could make Open MPI tell me more about
what it is doing?
I already use the mpi_param_check, mpi_show_handle_leaks,
mpi_show_mca_params, mpi_keep_peer_hostnames, mpi_abort_print_stack,
orte_debug, orte_debug_verbose, orte_debug_daemons,
ras_gridengine_debug, ras_gridengine_verbose, ras_gridengine_show_jobid,
and ras_base_verbose MCA parameters, but nothing useful appears.
 
> Does the script specify by accident a dedicated hostlist or refer to a
> self-assembled hostfile (which would contradict the granted machine)?
I don't think so. Anyway it's failing in a random way without changing
any parameter.

> No "-nolocal" specified?
No.
-- 
Bernard Massot
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to