The Origin documentation suggests that by default, OpenShift listens to
traffic on ports 80 and 443 over all host network interfaces. Although not
explicitly stated anywhere I can find, this suggests to me that the default
network plugins treat all interfaces uniformly. However, I have encountered
some odd network behavior on OpenShift master nodes with multiple NICs
available on their virtual hosts. Because there are so many different ways
to configure networking, I need help determining whether I have a
misconfiguration or if I have encountered a bug.

I have a private cloud that uses a Microsoft Hypervisor for each of my VMs.
It has been configured with two virtual networks: a completely "isolated"
network and a "public" network capable of communicating directly with other
computers in my physical network and with the internet.

I created a few VMs for my OpenShift cluster, each with a NIC pointing to
the isolated network, and installed the minimal server image of CentOS 7.4
on each of them (with 1 additional VM running the Atomic Host variant).
I then created an additional VM with 2 NICs, where 1 NIC was on the same
isolated network, and the other NIC was on my "public" network, allowing
SSH access from my physical workstation. This VM was used to host DNS
services to the isolated network and was the point from which I executed
Ansible scripts to install OpenShift Origin onto the isolated VMs. So I can
SSH from my workstation into the bastion host, from which point I can SSH
into the isolated VMs.

[workstation] --- // (gateway) // ---- ("public" vLAN) ---- [bastion host]
---- ("isolated" vLAN) --- [OpenShift nodes]

where the subnet used for the public vLAN occupies 172.18.8.0/23, and
subnet used by the OpenShift cluster on the private vLAN is 192.168.1.0/24

I am using the custom network configuration in Origin, so, for example,
iptables is being used (not firewalld). The only non-standard config was
manually setting the --bip for the Docker daemon to 192.168.200.1/24,
although there wasn't any risk of address overlap.

After running applications successfully on OpenShift Origin 3.7 in
isolation with 4 nodes (including 1 master+etcd), I decided to open the
OpenShift master to network traffic on the "public" vLAN, so I added a
second network interface (eth1) to the OpenShift master node.
NetworkManager shows the connection is up, and it successfully received a
DHCP assignment. I am able to communicate with other VMs camped on that
same subnet, but unlike all those other VMs NOT running OpenShift, this VM
seems to ONLY allow traffic within the subnet on that public NIC. In other
words, it's as if the gateway is misconfigured, but as I used DHCP and can
see the correct route defined for ip route, that doesn't seem to be the
case. As if that wasn't odd enough, my oc commands stopped working from the
master node.

Within my private vLAN, the master node's FQDN was openshift.private.net.
Because the public vLAN's DNS server is unfamiliar with the "private.net"
subdomain, if I simply run
oc login
from the master node, the response is

Unable to connect to the server: dial tcp: lookup openshift.private.net on
172.18.9.19:53: no such host


FWIW, I have PEERDNS=no in the public interface configuration, yet
99-origin-dns.sh appears to have applied the public interface's DNS
configuration to /etc/resolv.conf so it looks like there is a requirement
to manually specify the DNS server during Origin installation?

To test the theory that this is simply an issue with DNS lookup, I manually
edited the /etc/resolv.conf file. Now the oc commands yield a different
error:

The connection to the server openshift.private.net:8443 was refused - did
you specify the right host or port?

Since this is now beyond my basic networking expertise, I tried another
cluster, where ALL VM interfaces were on the public vLAN. Communication
between the VMs and the other devices on my network was successful, even
with two public NICs on the master node, so I deployed OpenShift Origin
3.6, and suddenly I could only communicate over eth0. Otherwise, everything
worked until I rebooted the master node. At that point, I noticed internal
name resolution began failing, so I looked at the ip route response and
found that any time the order in which the devices appears was different
compared to the previous boot. Only when eth0 was the first in the list did
internal name resolution work.

So before I start banging my head against a wall, is there a variable I
should set explicitly in my inventory file prior to deploying Origin to
prevent OpenShift from breaking communication over additional NICs?
_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to