Hi,

Am 10.09.2013 um 17:00 schrieb Burian, John:

> I have an OGS 2011.11p1 cluster. The primary submit host is a separate 
> machine from the queue master. When I try to use qrsh from the submit node, I 
> get a commlib error (Levi-Montalcini01 is the queue master, Levi-Montalcini86 
> is a compute node):
> 
> $ qrsh  -verbose
> Your job 590725 ("QRLOGIN") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 590725 has been successfully scheduled.
> Establishing builtin session to host Levi-Montalcini86 ...
> error: commlib error: local host name error (IP based host name resolving 
> "Levi-Montalcini01" doesn't match client host name from connect message 
> "Levi-Montalcini86")
> $
> 
> When I use qrsh from the queue master, it works fine:
> 
> $ qrsh -verbose
> Your job 590750 ("QRLOGIN") has been submitted
> waiting for interactive job to be scheduled ...
> Your interactive job 590750 has been successfully scheduled.
> Establishing builtin session to host Levi-Montalcini88 ...
> Levi-Montalcini88|~> 
> 
> During the failed attempt, I see traffic from the compute node back to the 
> queue master, but no traffic to the submit node from either the queue master 
> or the compute node. Is qrsh from a separate submit node expected to work?

Yes, as long as there is a direct connection between the submit host and the 
exechost (or a proper forwarding between them).

Do the Levi-Montalcini01 and Levi-Montalcini86 resolve to the same TCP/IP 
address? Why are there different names?

-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to