Jonathan,

I’d suggest to try the following:
1. Add “dns=none” to the main section of /etc/NetworkManager/NetworkManager.conf
2. Restart NetworkManager
3. Edit /etc/resolv.conf manually. Set a proper nameserver and remove a 
99-origin-dns.sh comment line.
4. Restart NetworkManager again


From: <[email protected]> on behalf of Jonathan Lee 
<[email protected]>
Date: Saturday, 12 May 2018 at 01:42
To: "[email protected]" <[email protected]>
Subject: Enable traffic through an additional NIC

The Origin documentation suggests that by default, OpenShift listens to traffic 
on ports 80 and 443 over all host network interfaces. Although not explicitly 
stated anywhere I can find, this suggests to me that the default network 
plugins treat all interfaces uniformly. However, I have encountered some odd 
network behavior on OpenShift master nodes with multiple NICs available on 
their virtual hosts. Because there are so many different ways to configure 
networking, I need help determining whether I have a misconfiguration or if I 
have encountered a bug.

I have a private cloud that uses a Microsoft Hypervisor for each of my VMs. It 
has been configured with two virtual networks: a completely "isolated" network 
and a "public" network capable of communicating directly with other computers 
in my physical network and with the internet.

I created a few VMs for my OpenShift cluster, each with a NIC pointing to the 
isolated network, and installed the minimal server image of CentOS 7.4 on each 
of them (with 1 additional VM running the Atomic Host variant).
I then created an additional VM with 2 NICs, where 1 NIC was on the same 
isolated network, and the other NIC was on my "public" network, allowing SSH 
access from my physical workstation. This VM was used to host DNS services to 
the isolated network and was the point from which I executed Ansible scripts to 
install OpenShift Origin onto the isolated VMs. So I can SSH from my 
workstation into the bastion host, from which point I can SSH into the isolated 
VMs.

[workstation] --- // (gateway) // ---- ("public" vLAN) ---- [bastion host] ---- 
("isolated" vLAN) --- [OpenShift nodes]

where the subnet used for the public vLAN occupies 
172.18.8.0/23<http://172.18.8.0/23>, and subnet used by the OpenShift cluster 
on the private vLAN is 192.168.1.0/24<http://192.168.1.0/24>

I am using the custom network configuration in Origin, so, for example, 
iptables is being used (not firewalld). The only non-standard config was 
manually setting the --bip for the Docker daemon to 
192.168.200.1/24<http://192.168.200.1/24>, although there wasn't any risk of 
address overlap.

After running applications successfully on OpenShift Origin 3.7 in isolation 
with 4 nodes (including 1 master+etcd), I decided to open the OpenShift master 
to network traffic on the "public" vLAN, so I added a second network interface 
(eth1) to the OpenShift master node. NetworkManager shows the connection is up, 
and it successfully received a DHCP assignment. I am able to communicate with 
other VMs camped on that same subnet, but unlike all those other VMs NOT 
running OpenShift, this VM seems to ONLY allow traffic within the subnet on 
that public NIC. In other words, it's as if the gateway is misconfigured, but 
as I used DHCP and can see the correct route defined for ip route, that doesn't 
seem to be the case. As if that wasn't odd enough, my oc commands stopped 
working from the master node.

Within my private vLAN, the master node's FQDN was 
openshift.private.net<http://openshift.private.net>. Because the public vLAN's 
DNS server is unfamiliar with the "private.net<http://private.net>" subdomain, 
if I simply run
oc login
from the master node, the response is

Unable to connect to the server: dial tcp: lookup openshift.private.net on 
172.18.9.19:53: no such host


FWIW, I have PEERDNS=no in the public interface configuration, yet 
99-origin-dns.sh appears to have applied the public interface's DNS 
configuration to /etc/resolv.conf so it looks like there is a requirement to 
manually specify the DNS server during Origin installation?

To test the theory that this is simply an issue with DNS lookup, I manually 
edited the /etc/resolv.conf file. Now the oc commands yield a different error:

The connection to the server openshift.private.net:8443 was refused - did you 
specify the right host or port?

Since this is now beyond my basic networking expertise, I tried another 
cluster, where ALL VM interfaces were on the public vLAN. Communication between 
the VMs and the other devices on my network was successful, even with two 
public NICs on the master node, so I deployed OpenShift Origin 3.6, and 
suddenly I could only communicate over eth0. Otherwise, everything worked until 
I rebooted the master node. At that point, I noticed internal name resolution 
began failing, so I looked at the ip route response and found that any time the 
order in which the devices appears was different compared to the previous boot. 
Only when eth0 was the first in the list did internal name resolution work.

So before I start banging my head against a wall, is there a variable I should 
set explicitly in my inventory file prior to deploying Origin to prevent 
OpenShift from breaking communication over additional NICs?

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to