Thank you very much for the extensive response, Samuel! I've found that I do have a DNS misconfiguration so I receive the CSR error from the title not because of something related to Openshift installer procedure.
Somehow (and I haven't yet found the reason, but still looking for it) dnsmasq fills the upstream DNS configuration with some public nameservers and not my "internal" DNS. So after the openshift-ansible playbook, related to this, installs dnsmasq and calls the /etc/NetworkManager/dispatcher.d/99-origin-dns.sh script(restarts NetworkManager), all nodes end up with "bad" upstream nameservers (in the /etc/dnsmasq.d/origin-upstream-dns.conf and /etc/origin/node/resolv.conf files). Even if the /etc/resolv.conf file for each host has the right nameserver and search domain, dnsmasq populates the OKD-related conf files above with a different nameserver. I think this is related to dnsmasq/NetworkManager specific configuration....will have to look into it and figure out what's not going as expected and why. I believe these are served by the DHCP server, but still looking for a way to address this. Anyway thanks again for the input, it put me on the right track! :) Dan În dum., 2 iun. 2019 la 22:04, Samuel Martín Moro <[email protected]> a scris: > Hi, > > > This is quite puzzling, ... could you share your inventory with us? make > sure to obfuscate any sensitive data (ldap/htpasswd credentials among > others, ...) > mostly interested in potential openshift_node_groups edition. Although > something else might come up (?) > > > At first glance, you are right, it sounds like a firewalling issue. > Yet from your description, you did open all required ports. > I could suggest you check back on these, make sure your data is accurate - > although I would assume it is. > Also: if using Cri-O as a runtime, note that you would be missing port > 10010, that should be opened on all nodes. Yet I don't think that one would > be related to nodes registrations against your master API. > > Another explanation could be related to DNS (can your infra/compute nodes > properly resolve your masters name? the contrary would be unusual, still > could explain what's going on). > > As a general rule, at that stage, I would restart the origin-node service > on those hosts that fail to register, keeping an eye on /var/log/messages > (or journalctl -f). > If that doesn't help, I might raise log levels in > /etc/sysconfig/origin-node (there's a variable which defaults to 2, you can > change it to 99, beware it would give you a lots of logs/could saturate > your disks at some point, don't keep it like this over a long period) > > Dealing with large volumes of logs, note that openshift services tends to > store messages with prefix based on severity: you might be able to "| grep > -E 'E[0-9][0-9]" to focus on error messages, or W[0-9][0-9] for warnings, > ... > > Your issue being potentially related to firewalling, I might also use > tcpdump looking into what's being exchanged between nodes. > Look for any packets with a SYN flag ("[S]") that would not be followed by > an SYN-ACK ("[S.]"). > > > Let us know how that goes, > > > Good luck. > Failing during the "Approve node certificate" steps is relatively common, > and could have several causes, from node groups configuration, to DNS, > firewalls, broken TCP handshake, MTU not allowing for certificates to go > through, ... we'll want to dig deeper, to elucidate that issue. > > > Regards. > > On Sat, Jun 1, 2019 at 12:19 PM Punga Dan <[email protected]> wrote: > >> Hello all! >> >> I'm hitting a problem when trying to install a OKD3.11 on one master 2 >> infra and 2 compute nodes. The hosts are VM that run centos7. >> I've gone through the issues related to this subject: >> https://access.redhat.com/solutions/3680401 which suggest naming the >> hosts as FQDN. Tried it with the same problem appearing for the same set of >> hosts(all except the master). >> >> In my case the error is only for the 2 infra nodes and 2 compute nodes, >> so not for the master as well. >> >> oc get nodes gives me just the master node, but I guess this is the case >> as the other OKD-nodes stand to be created by the process that fails. Am I >> wrong? >> >> oc get csr gives me a result of 3 csrs: >> [root@master ~]# oc get csr >> NAME AGE REQUESTOR CONDITION >> csr-4xjjb 24m system:admin Approved,Issued >> csr-b6x45 24m system:admin Approved,Issued >> csr-hgmpf 20m system:node:master Approved,Issued >> >> Here I believe I have 2 csrs for system:Admin because I ran >> the playbooks/openshift-node/join.yml a second time. >> >> The bootstrapping certificates on the master look fine(??) >> [root@master ~]# ll /etc/origin/node/certificates/ >> total 20 >> -rw-------. 1 root root 2830 iun 1 11:30 >> kubelet-client-2019-06-01-11-30-04.pem >> -rw-------. 1 root root 1135 iun 1 11:31 >> kubelet-client-2019-06-01-11-31-23.pem >> lrwxrwxrwx. 1 root root 68 iun 1 11:31 kubelet-client-current.pem -> >> /etc/origin/node/certificates/kubelet-client-2019-06-01-11-31-23.pem >> -rw-------. 1 root root 1179 iun 1 11:35 >> kubelet-server-2019-06-01-11-35-42.pem >> lrwxrwxrwx. 1 root root 68 iun 1 11:35 kubelet-server-current.pem -> >> /etc/origin/node/certificates/kubelet-server-2019-06-01-11-35-42.pem >> >> I've rechecked the open ports thinking the issue lies in some >> network-related config. >> - all hosts have the node related ports opened: 53/udp, 10250/tcp, >> 4789/udp >> - master(with etcd): 8053/udp+tcp, 2049/udp+tcp, 8443/tcp, 8444/tcp, >> 4789/udp, 53/udp >> - infra has on top of the node ones, the ports related to router/routes >> and logging components which it will host >> The chosen SDN >> is os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant' with no >> extra config in the inventory file. (Do I need any?) >> >> >> Any hints about where and what to check would be much appreciated! >> >> Best regards, >> Dan Pungă >> _______________________________________________ >> users mailing list >> [email protected] >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> > > > -- > Samuel Martín Moro > {EPITECH.} 2011 > > "Nobody wants to say how this works. > Maybe nobody knows ..." > Xorg.conf(5) >
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
