Re: OKD3.11 install blocked - Could not find csr for nodes

Punga Dan Tue, 04 Jun 2019 02:16:31 -0700

Salut Dani,

I'm using openshift-ansible release-3.11 tag.


Dan

În mar., 4 iun. 2019 la 09:54, Daniel Comnea <[email protected]> a
scris:

> Hi Dan,
>
> Which openshift-ansible release tag have you used ?
>
>
> Cheers,
> Dani
>
> On Mon, Jun 3, 2019 at 4:18 PM Punga Dan <[email protected]> wrote:
>
>> Thank you very much for the extensive response, Samuel!
>>
>> I've found that I do have a DNS misconfiguration so I receive the CSR
>> error from the title not because of something related to Openshift
>> installer procedure.
>>
>> Somehow (and I haven't yet found the reason, but still looking for it)
>> dnsmasq fills the upstream DNS configuration with some public nameservers
>> and not my "internal" DNS.
>> So after the openshift-ansible playbook, related to this, installs
>> dnsmasq and calls the /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
>> script(restarts NetworkManager), all nodes end up with "bad" upstream
>> nameservers (in the /etc/dnsmasq.d/origin-upstream-dns.conf and
>> /etc/origin/node/resolv.conf files).
>> Even if the /etc/resolv.conf file for each host has the right nameserver
>> and search domain, dnsmasq populates the OKD-related conf files above with
>> a different nameserver.
>>
>> I think this is related to dnsmasq/NetworkManager specific
>> configuration....will have to look into it and figure out what's not going
>> as expected and why. I believe these are served by the DHCP server, but
>> still looking for a way to address this.
>>
>> Anyway thanks again for the input, it put me on the right track! :)
>>
>> Dan
>>
>> În dum., 2 iun. 2019 la 22:04, Samuel Martín Moro <[email protected]> a
>> scris:
>>
>>> Hi,
>>>
>>>
>>> This is quite puzzling, ... could you share your inventory with us? make
>>> sure to obfuscate any sensitive data (ldap/htpasswd credentials among
>>> others, ...)
>>> mostly interested in potential openshift_node_groups edition. Although
>>> something else might come up (?)
>>>
>>>
>>> At first glance, you are right, it sounds like a firewalling issue.
>>> Yet from your description, you did open all required ports.
>>> I could suggest you check back on these, make sure your data is accurate
>>> - although I would assume it is.
>>> Also: if using Cri-O as a runtime, note that you would be missing port
>>> 10010, that should be opened on all nodes. Yet I don't think that one would
>>> be related to nodes registrations against your master API.
>>>
>>> Another explanation could be related to DNS (can your infra/compute
>>> nodes properly resolve your masters name? the contrary would be unusual,
>>> still could explain what's going on).
>>>
>>> As a general rule, at that stage, I would restart the origin-node
>>> service on those hosts that fail to register, keeping an eye on
>>> /var/log/messages (or journalctl -f).
>>> If that doesn't help, I might raise log levels in
>>> /etc/sysconfig/origin-node (there's a variable which defaults to 2, you can
>>> change it to 99, beware it would give you a lots of logs/could saturate
>>> your disks at some point, don't keep it like this over a long period)
>>>
>>> Dealing with large volumes of logs, note that openshift services tends
>>> to store messages with prefix based on severity: you might be able to "|
>>> grep -E 'E[0-9][0-9]" to focus on error messages, or W[0-9][0-9] for
>>> warnings, ...
>>>
>>> Your issue being potentially related to firewalling, I might also use
>>> tcpdump looking into what's being exchanged between nodes.
>>> Look for any packets with a SYN flag ("[S]") that would not be followed
>>> by an SYN-ACK ("[S.]").
>>>
>>>
>>> Let us know how that goes,
>>>
>>>
>>> Good luck.
>>> Failing during the "Approve node certificate" steps is relatively
>>> common, and could have several causes, from node groups configuration, to
>>> DNS, firewalls, broken TCP handshake, MTU not allowing for certificates to
>>> go through, ... we'll want to dig deeper, to elucidate that issue.
>>>
>>>
>>> Regards.
>>>
>>> On Sat, Jun 1, 2019 at 12:19 PM Punga Dan <[email protected]> wrote:
>>>
>>>> Hello all!
>>>>
>>>> I'm hitting a problem when trying to install a OKD3.11 on one master 2
>>>> infra and 2 compute nodes. The hosts are VM that run centos7.
>>>> I've gone through the issues related to this subject:
>>>> https://access.redhat.com/solutions/3680401 which suggest naming the
>>>> hosts as FQDN. Tried it with the same problem appearing for the same set of
>>>> hosts(all except the master).
>>>>
>>>> In my case the error is only for the 2 infra nodes and 2 compute nodes,
>>>> so not for the master as well.
>>>>
>>>> oc get nodes gives me just the master node, but I guess this is the
>>>> case as the other OKD-nodes stand to be created by the process that fails.
>>>> Am I wrong?
>>>>
>>>> oc get csr gives me a result of 3 csrs:
>>>> [root@master ~]# oc get csr
>>>> NAME        AGE       REQUESTOR            CONDITION
>>>> csr-4xjjb   24m       system:admin         Approved,Issued
>>>> csr-b6x45   24m       system:admin         Approved,Issued
>>>> csr-hgmpf   20m       system:node:master   Approved,Issued
>>>>
>>>> Here I believe I have 2 csrs for system:Admin because I ran
>>>> the playbooks/openshift-node/join.yml a second time.
>>>>
>>>> The bootstrapping certificates on the master look fine(??)
>>>> [root@master ~]# ll /etc/origin/node/certificates/
>>>> total 20
>>>> -rw-------. 1 root root 2830 iun  1 11:30
>>>> kubelet-client-2019-06-01-11-30-04.pem
>>>> -rw-------. 1 root root 1135 iun  1 11:31
>>>> kubelet-client-2019-06-01-11-31-23.pem
>>>> lrwxrwxrwx. 1 root root   68 iun  1 11:31 kubelet-client-current.pem ->
>>>> /etc/origin/node/certificates/kubelet-client-2019-06-01-11-31-23.pem
>>>> -rw-------. 1 root root 1179 iun  1 11:35
>>>> kubelet-server-2019-06-01-11-35-42.pem
>>>> lrwxrwxrwx. 1 root root   68 iun  1 11:35 kubelet-server-current.pem ->
>>>> /etc/origin/node/certificates/kubelet-server-2019-06-01-11-35-42.pem
>>>>
>>>>  I've rechecked the open ports thinking the issue lies in some
>>>> network-related config.
>>>> - all hosts have the node related ports opened: 53/udp, 10250/tcp,
>>>> 4789/udp
>>>> - master(with etcd): 8053/udp+tcp, 2049/udp+tcp, 8443/tcp, 8444/tcp,
>>>> 4789/udp, 53/udp
>>>> - infra has on top of the node ones, the ports related to router/routes
>>>> and logging components which it will host
>>>> The chosen SDN
>>>> is os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant' with no
>>>> extra config in the inventory file. (Do I need any?)
>>>>
>>>>
>>>> Any hints about where and what to check would be much appreciated!
>>>>
>>>> Best regards,
>>>> Dan Pungă
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>>
>>>
>>>
>>> --
>>> Samuel Martín Moro
>>> {EPITECH.} 2011
>>>
>>> "Nobody wants to say how this works.
>>>  Maybe nobody knows ..."
>>>                       Xorg.conf(5)
>>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: OKD3.11 install blocked - Could not find csr for nodes

Reply via email to