Re: OKD3.11 install blocked - Could not find csr for nodes

Daniel Comnea Mon, 03 Jun 2019 23:55:39 -0700

Hi Dan,

Which openshift-ansible release tag have you used ?



Cheers,
Dani

On Mon, Jun 3, 2019 at 4:18 PM Punga Dan <dan.pu...@gmail.com> wrote:

> Thank you very much for the extensive response, Samuel!
>
> I've found that I do have a DNS misconfiguration so I receive the CSR
> error from the title not because of something related to Openshift
> installer procedure.
>
> Somehow (and I haven't yet found the reason, but still looking for it)
> dnsmasq fills the upstream DNS configuration with some public nameservers
> and not my "internal" DNS.
> So after the openshift-ansible playbook, related to this, installs dnsmasq
> and calls the /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
> script(restarts NetworkManager), all nodes end up with "bad" upstream
> nameservers (in the /etc/dnsmasq.d/origin-upstream-dns.conf and
> /etc/origin/node/resolv.conf files).
> Even if the /etc/resolv.conf file for each host has the right nameserver
> and search domain, dnsmasq populates the OKD-related conf files above with
> a different nameserver.
>
> I think this is related to dnsmasq/NetworkManager specific
> configuration....will have to look into it and figure out what's not going
> as expected and why. I believe these are served by the DHCP server, but
> still looking for a way to address this.
>
> Anyway thanks again for the input, it put me on the right track! :)
>
> Dan
>
> În dum., 2 iun. 2019 la 22:04, Samuel Martín Moro <faus...@gmail.com> a
> scris:
>
>> Hi,
>>
>>
>> This is quite puzzling, ... could you share your inventory with us? make
>> sure to obfuscate any sensitive data (ldap/htpasswd credentials among
>> others, ...)
>> mostly interested in potential openshift_node_groups edition. Although
>> something else might come up (?)
>>
>>
>> At first glance, you are right, it sounds like a firewalling issue.
>> Yet from your description, you did open all required ports.
>> I could suggest you check back on these, make sure your data is accurate
>> - although I would assume it is.
>> Also: if using Cri-O as a runtime, note that you would be missing port
>> 10010, that should be opened on all nodes. Yet I don't think that one would
>> be related to nodes registrations against your master API.
>>
>> Another explanation could be related to DNS (can your infra/compute nodes
>> properly resolve your masters name? the contrary would be unusual, still
>> could explain what's going on).
>>
>> As a general rule, at that stage, I would restart the origin-node service
>> on those hosts that fail to register, keeping an eye on /var/log/messages
>> (or journalctl -f).
>> If that doesn't help, I might raise log levels in
>> /etc/sysconfig/origin-node (there's a variable which defaults to 2, you can
>> change it to 99, beware it would give you a lots of logs/could saturate
>> your disks at some point, don't keep it like this over a long period)
>>
>> Dealing with large volumes of logs, note that openshift services tends to
>> store messages with prefix based on severity: you might be able to "| grep
>> -E 'E[0-9][0-9]" to focus on error messages, or W[0-9][0-9] for warnings,
>> ...
>>
>> Your issue being potentially related to firewalling, I might also use
>> tcpdump looking into what's being exchanged between nodes.
>> Look for any packets with a SYN flag ("[S]") that would not be followed
>> by an SYN-ACK ("[S.]").
>>
>>
>> Let us know how that goes,
>>
>>
>> Good luck.
>> Failing during the "Approve node certificate" steps is relatively common,
>> and could have several causes, from node groups configuration, to DNS,
>> firewalls, broken TCP handshake, MTU not allowing for certificates to go
>> through, ... we'll want to dig deeper, to elucidate that issue.
>>
>>
>> Regards.
>>
>> On Sat, Jun 1, 2019 at 12:19 PM Punga Dan <dan.pu...@gmail.com> wrote:
>>
>>> Hello all!
>>>
>>> I'm hitting a problem when trying to install a OKD3.11 on one master 2
>>> infra and 2 compute nodes. The hosts are VM that run centos7.
>>> I've gone through the issues related to this subject:
>>> https://access.redhat.com/solutions/3680401 which suggest naming the
>>> hosts as FQDN. Tried it with the same problem appearing for the same set of
>>> hosts(all except the master).
>>>
>>> In my case the error is only for the 2 infra nodes and 2 compute nodes,
>>> so not for the master as well.
>>>
>>> oc get nodes gives me just the master node, but I guess this is the case
>>> as the other OKD-nodes stand to be created by the process that fails. Am I
>>> wrong?
>>>
>>> oc get csr gives me a result of 3 csrs:
>>> [root@master ~]# oc get csr
>>> NAME        AGE       REQUESTOR            CONDITION
>>> csr-4xjjb   24m       system:admin         Approved,Issued
>>> csr-b6x45   24m       system:admin         Approved,Issued
>>> csr-hgmpf   20m       system:node:master   Approved,Issued
>>>
>>> Here I believe I have 2 csrs for system:Admin because I ran
>>> the playbooks/openshift-node/join.yml a second time.
>>>
>>> The bootstrapping certificates on the master look fine(??)
>>> [root@master ~]# ll /etc/origin/node/certificates/
>>> total 20
>>> -rw-------. 1 root root 2830 iun  1 11:30
>>> kubelet-client-2019-06-01-11-30-04.pem
>>> -rw-------. 1 root root 1135 iun  1 11:31
>>> kubelet-client-2019-06-01-11-31-23.pem
>>> lrwxrwxrwx. 1 root root   68 iun  1 11:31 kubelet-client-current.pem ->
>>> /etc/origin/node/certificates/kubelet-client-2019-06-01-11-31-23.pem
>>> -rw-------. 1 root root 1179 iun  1 11:35
>>> kubelet-server-2019-06-01-11-35-42.pem
>>> lrwxrwxrwx. 1 root root   68 iun  1 11:35 kubelet-server-current.pem ->
>>> /etc/origin/node/certificates/kubelet-server-2019-06-01-11-35-42.pem
>>>
>>>  I've rechecked the open ports thinking the issue lies in some
>>> network-related config.
>>> - all hosts have the node related ports opened: 53/udp, 10250/tcp,
>>> 4789/udp
>>> - master(with etcd): 8053/udp+tcp, 2049/udp+tcp, 8443/tcp, 8444/tcp,
>>> 4789/udp, 53/udp
>>> - infra has on top of the node ones, the ports related to router/routes
>>> and logging components which it will host
>>> The chosen SDN
>>> is os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant' with no
>>> extra config in the inventory file. (Do I need any?)
>>>
>>>
>>> Any hints about where and what to check would be much appreciated!
>>>
>>> Best regards,
>>> Dan Pungă
>>> _______________________________________________
>>> users mailing list
>>> users@lists.openshift.redhat.com
>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>
>>
>>
>> --
>> Samuel Martín Moro
>> {EPITECH.} 2011
>>
>> "Nobody wants to say how this works.
>>  Maybe nobody knows ..."
>>                       Xorg.conf(5)
>>
> _______________________________________________
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>

_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: OKD3.11 install blocked - Could not find csr for nodes

Reply via email to