Re: OKD3.11 install blocked - Could not find csr for nodes

Punga Dan Mon, 03 Jun 2019 08:19:04 -0700

Thank you very much for the extensive response, Samuel!

I've found that I do have a DNS misconfiguration so I receive the CSR error
from the title not because of something related to Openshift installer
procedure.


Somehow (and I haven't yet found the reason, but still looking for it)
dnsmasq fills the upstream DNS configuration with some public nameservers
and not my "internal" DNS.
So after the openshift-ansible playbook, related to this, installs dnsmasq
and calls the /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
script(restarts NetworkManager), all nodes end up with "bad" upstream
nameservers (in the /etc/dnsmasq.d/origin-upstream-dns.conf and
/etc/origin/node/resolv.conf files).
Even if the /etc/resolv.conf file for each host has the right nameserver
and search domain, dnsmasq populates the OKD-related conf files above with
a different nameserver.

I think this is related to dnsmasq/NetworkManager specific
configuration....will have to look into it and figure out what's not going
as expected and why. I believe these are served by the DHCP server, but
still looking for a way to address this.

Anyway thanks again for the input, it put me on the right track! :)

Dan

În dum., 2 iun. 2019 la 22:04, Samuel Martín Moro <[email protected]> a
scris:

> Hi,
>
>
> This is quite puzzling, ... could you share your inventory with us? make
> sure to obfuscate any sensitive data (ldap/htpasswd credentials among
> others, ...)
> mostly interested in potential openshift_node_groups edition. Although
> something else might come up (?)
>
>
> At first glance, you are right, it sounds like a firewalling issue.
> Yet from your description, you did open all required ports.
> I could suggest you check back on these, make sure your data is accurate -
> although I would assume it is.
> Also: if using Cri-O as a runtime, note that you would be missing port
> 10010, that should be opened on all nodes. Yet I don't think that one would
> be related to nodes registrations against your master API.
>
> Another explanation could be related to DNS (can your infra/compute nodes
> properly resolve your masters name? the contrary would be unusual, still
> could explain what's going on).
>
> As a general rule, at that stage, I would restart the origin-node service
> on those hosts that fail to register, keeping an eye on /var/log/messages
> (or journalctl -f).
> If that doesn't help, I might raise log levels in
> /etc/sysconfig/origin-node (there's a variable which defaults to 2, you can
> change it to 99, beware it would give you a lots of logs/could saturate
> your disks at some point, don't keep it like this over a long period)
>
> Dealing with large volumes of logs, note that openshift services tends to
> store messages with prefix based on severity: you might be able to "| grep
> -E 'E[0-9][0-9]" to focus on error messages, or W[0-9][0-9] for warnings,
> ...
>
> Your issue being potentially related to firewalling, I might also use
> tcpdump looking into what's being exchanged between nodes.
> Look for any packets with a SYN flag ("[S]") that would not be followed by
> an SYN-ACK ("[S.]").
>
>
> Let us know how that goes,
>
>
> Good luck.
> Failing during the "Approve node certificate" steps is relatively common,
> and could have several causes, from node groups configuration, to DNS,
> firewalls, broken TCP handshake, MTU not allowing for certificates to go
> through, ... we'll want to dig deeper, to elucidate that issue.
>
>
> Regards.
>
> On Sat, Jun 1, 2019 at 12:19 PM Punga Dan <[email protected]> wrote:
>
>> Hello all!
>>
>> I'm hitting a problem when trying to install a OKD3.11 on one master 2
>> infra and 2 compute nodes. The hosts are VM that run centos7.
>> I've gone through the issues related to this subject:
>> https://access.redhat.com/solutions/3680401 which suggest naming the
>> hosts as FQDN. Tried it with the same problem appearing for the same set of
>> hosts(all except the master).
>>
>> In my case the error is only for the 2 infra nodes and 2 compute nodes,
>> so not for the master as well.
>>
>> oc get nodes gives me just the master node, but I guess this is the case
>> as the other OKD-nodes stand to be created by the process that fails. Am I
>> wrong?
>>
>> oc get csr gives me a result of 3 csrs:
>> [root@master ~]# oc get csr
>> NAME        AGE       REQUESTOR            CONDITION
>> csr-4xjjb   24m       system:admin         Approved,Issued
>> csr-b6x45   24m       system:admin         Approved,Issued
>> csr-hgmpf   20m       system:node:master   Approved,Issued
>>
>> Here I believe I have 2 csrs for system:Admin because I ran
>> the playbooks/openshift-node/join.yml a second time.
>>
>> The bootstrapping certificates on the master look fine(??)
>> [root@master ~]# ll /etc/origin/node/certificates/
>> total 20
>> -rw-------. 1 root root 2830 iun  1 11:30
>> kubelet-client-2019-06-01-11-30-04.pem
>> -rw-------. 1 root root 1135 iun  1 11:31
>> kubelet-client-2019-06-01-11-31-23.pem
>> lrwxrwxrwx. 1 root root   68 iun  1 11:31 kubelet-client-current.pem ->
>> /etc/origin/node/certificates/kubelet-client-2019-06-01-11-31-23.pem
>> -rw-------. 1 root root 1179 iun  1 11:35
>> kubelet-server-2019-06-01-11-35-42.pem
>> lrwxrwxrwx. 1 root root   68 iun  1 11:35 kubelet-server-current.pem ->
>> /etc/origin/node/certificates/kubelet-server-2019-06-01-11-35-42.pem
>>
>>  I've rechecked the open ports thinking the issue lies in some
>> network-related config.
>> - all hosts have the node related ports opened: 53/udp, 10250/tcp,
>> 4789/udp
>> - master(with etcd): 8053/udp+tcp, 2049/udp+tcp, 8443/tcp, 8444/tcp,
>> 4789/udp, 53/udp
>> - infra has on top of the node ones, the ports related to router/routes
>> and logging components which it will host
>> The chosen SDN
>> is os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant' with no
>> extra config in the inventory file. (Do I need any?)
>>
>>
>> Any hints about where and what to check would be much appreciated!
>>
>> Best regards,
>> Dan Pungă
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>
>
> --
> Samuel Martín Moro
> {EPITECH.} 2011
>
> "Nobody wants to say how this works.
>  Maybe nobody knows ..."
>                       Xorg.conf(5)
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: OKD3.11 install blocked - Could not find csr for nodes

Reply via email to