Thanks Chris and Reuti for responses, but I suspect my problem is here:

On 9/10/13 11:55 AM, "John Kloss" <[email protected]> wrote:

>Is your submit host multi-homed?

The submit host and the queue master are both multi homed. The cluster has
a 'internal' network, all the compute nodes sit on this network, and the
submit host and queue master both have interfaces on this network. Then
there is the 'external' network, the submit host and queue master have
interfaces on this network, as well as user desktop machines. One design
goal was that desktop machines could eventually be used as submit hosts,
so the queue master has to function on both networks. The compute nodes on
the 'internal' network only communicate to the external network through
the queue master, which runs an IP Masquerade iptables rule.

Now that I've explained it, I see what the problem is: The submit host is
communicating with the queue master over the external network; the queue
master starts the interactive job on the compute node, which tries to
contact the submit host at its external address,  is getting routed
through the queue master and the iptables rule. Qrsh sees a connection
that claims to be from node 87, but which has the queue master's IP
address.

I believe there are some changes I can make on the submit host to fix this
specific problem, but I think I may still have trouble with the queue
master functioning on both networks. I vaguely recall reading about a
configuration option that told OGS about allowable host aliases. Am I
misremembering that?

Thanks,
John



>I have had issues where  I had a
>multi-homed submit host, say, hostA, which connects to two networks
>via
>
>hostA-int -> "grid network"
>hostA-ext -> "gateway network"
>
>Where "gateway network" and "grid network" do not route because
>they're isolated from each other.
>
>And the hostname used by hostA to contact a compute node is hostA-ext.
> The compute node can't reach hostA-ext; it can only reach hostA-int.
>I had to change the hostname for hostA to hostA-int (under
>/etc/hostname or /etc/sysconfig/network or /etc/node, etc.) so that
>IP/hostname resolution matched for the "grid network".
>
>Or, perhaps your submit host local hostname does not match your domain
>name lookup mechanism (DNS, NIS, etc.) .  That is, your submit host
>thinks its name is hostA.localhost and DNS thinks it's
>hostA-submit.somenet.com.
>
>What do you get when you type from the submit host
>
>hostname
>
>vs.
>
>nslookup <submit_hostname>
>
>?
>
>Thanks.
>
>  John.
>_______________________________________________
>users mailing list
>[email protected]
>https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to