Salut Dani, I'm using openshift-ansible release-3.11 tag.
Dan În mar., 4 iun. 2019 la 09:54, Daniel Comnea <[email protected]> a scris: > Hi Dan, > > Which openshift-ansible release tag have you used ? > > > Cheers, > Dani > > On Mon, Jun 3, 2019 at 4:18 PM Punga Dan <[email protected]> wrote: > >> Thank you very much for the extensive response, Samuel! >> >> I've found that I do have a DNS misconfiguration so I receive the CSR >> error from the title not because of something related to Openshift >> installer procedure. >> >> Somehow (and I haven't yet found the reason, but still looking for it) >> dnsmasq fills the upstream DNS configuration with some public nameservers >> and not my "internal" DNS. >> So after the openshift-ansible playbook, related to this, installs >> dnsmasq and calls the /etc/NetworkManager/dispatcher.d/99-origin-dns.sh >> script(restarts NetworkManager), all nodes end up with "bad" upstream >> nameservers (in the /etc/dnsmasq.d/origin-upstream-dns.conf and >> /etc/origin/node/resolv.conf files). >> Even if the /etc/resolv.conf file for each host has the right nameserver >> and search domain, dnsmasq populates the OKD-related conf files above with >> a different nameserver. >> >> I think this is related to dnsmasq/NetworkManager specific >> configuration....will have to look into it and figure out what's not going >> as expected and why. I believe these are served by the DHCP server, but >> still looking for a way to address this. >> >> Anyway thanks again for the input, it put me on the right track! :) >> >> Dan >> >> În dum., 2 iun. 2019 la 22:04, Samuel Martín Moro <[email protected]> a >> scris: >> >>> Hi, >>> >>> >>> This is quite puzzling, ... could you share your inventory with us? make >>> sure to obfuscate any sensitive data (ldap/htpasswd credentials among >>> others, ...) >>> mostly interested in potential openshift_node_groups edition. Although >>> something else might come up (?) >>> >>> >>> At first glance, you are right, it sounds like a firewalling issue. >>> Yet from your description, you did open all required ports. >>> I could suggest you check back on these, make sure your data is accurate >>> - although I would assume it is. >>> Also: if using Cri-O as a runtime, note that you would be missing port >>> 10010, that should be opened on all nodes. Yet I don't think that one would >>> be related to nodes registrations against your master API. >>> >>> Another explanation could be related to DNS (can your infra/compute >>> nodes properly resolve your masters name? the contrary would be unusual, >>> still could explain what's going on). >>> >>> As a general rule, at that stage, I would restart the origin-node >>> service on those hosts that fail to register, keeping an eye on >>> /var/log/messages (or journalctl -f). >>> If that doesn't help, I might raise log levels in >>> /etc/sysconfig/origin-node (there's a variable which defaults to 2, you can >>> change it to 99, beware it would give you a lots of logs/could saturate >>> your disks at some point, don't keep it like this over a long period) >>> >>> Dealing with large volumes of logs, note that openshift services tends >>> to store messages with prefix based on severity: you might be able to "| >>> grep -E 'E[0-9][0-9]" to focus on error messages, or W[0-9][0-9] for >>> warnings, ... >>> >>> Your issue being potentially related to firewalling, I might also use >>> tcpdump looking into what's being exchanged between nodes. >>> Look for any packets with a SYN flag ("[S]") that would not be followed >>> by an SYN-ACK ("[S.]"). >>> >>> >>> Let us know how that goes, >>> >>> >>> Good luck. >>> Failing during the "Approve node certificate" steps is relatively >>> common, and could have several causes, from node groups configuration, to >>> DNS, firewalls, broken TCP handshake, MTU not allowing for certificates to >>> go through, ... we'll want to dig deeper, to elucidate that issue. >>> >>> >>> Regards. >>> >>> On Sat, Jun 1, 2019 at 12:19 PM Punga Dan <[email protected]> wrote: >>> >>>> Hello all! >>>> >>>> I'm hitting a problem when trying to install a OKD3.11 on one master 2 >>>> infra and 2 compute nodes. The hosts are VM that run centos7. >>>> I've gone through the issues related to this subject: >>>> https://access.redhat.com/solutions/3680401 which suggest naming the >>>> hosts as FQDN. Tried it with the same problem appearing for the same set of >>>> hosts(all except the master). >>>> >>>> In my case the error is only for the 2 infra nodes and 2 compute nodes, >>>> so not for the master as well. >>>> >>>> oc get nodes gives me just the master node, but I guess this is the >>>> case as the other OKD-nodes stand to be created by the process that fails. >>>> Am I wrong? >>>> >>>> oc get csr gives me a result of 3 csrs: >>>> [root@master ~]# oc get csr >>>> NAME AGE REQUESTOR CONDITION >>>> csr-4xjjb 24m system:admin Approved,Issued >>>> csr-b6x45 24m system:admin Approved,Issued >>>> csr-hgmpf 20m system:node:master Approved,Issued >>>> >>>> Here I believe I have 2 csrs for system:Admin because I ran >>>> the playbooks/openshift-node/join.yml a second time. >>>> >>>> The bootstrapping certificates on the master look fine(??) >>>> [root@master ~]# ll /etc/origin/node/certificates/ >>>> total 20 >>>> -rw-------. 1 root root 2830 iun 1 11:30 >>>> kubelet-client-2019-06-01-11-30-04.pem >>>> -rw-------. 1 root root 1135 iun 1 11:31 >>>> kubelet-client-2019-06-01-11-31-23.pem >>>> lrwxrwxrwx. 1 root root 68 iun 1 11:31 kubelet-client-current.pem -> >>>> /etc/origin/node/certificates/kubelet-client-2019-06-01-11-31-23.pem >>>> -rw-------. 1 root root 1179 iun 1 11:35 >>>> kubelet-server-2019-06-01-11-35-42.pem >>>> lrwxrwxrwx. 1 root root 68 iun 1 11:35 kubelet-server-current.pem -> >>>> /etc/origin/node/certificates/kubelet-server-2019-06-01-11-35-42.pem >>>> >>>> I've rechecked the open ports thinking the issue lies in some >>>> network-related config. >>>> - all hosts have the node related ports opened: 53/udp, 10250/tcp, >>>> 4789/udp >>>> - master(with etcd): 8053/udp+tcp, 2049/udp+tcp, 8443/tcp, 8444/tcp, >>>> 4789/udp, 53/udp >>>> - infra has on top of the node ones, the ports related to router/routes >>>> and logging components which it will host >>>> The chosen SDN >>>> is os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant' with no >>>> extra config in the inventory file. (Do I need any?) >>>> >>>> >>>> Any hints about where and what to check would be much appreciated! >>>> >>>> Best regards, >>>> Dan Pungă >>>> _______________________________________________ >>>> users mailing list >>>> [email protected] >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>>> >>> >>> >>> -- >>> Samuel Martín Moro >>> {EPITECH.} 2011 >>> >>> "Nobody wants to say how this works. >>> Maybe nobody knows ..." >>> Xorg.conf(5) >>> >> _______________________________________________ >> users mailing list >> [email protected] >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> >
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
