Hi Wilfried, This seams like a routing issue then. The cluster seems kinda not to use the defined default gateway. Maybe there are more infos in one of the SDN pods? On a master node oc get pods -n openshift-sdn
Chose one pod and run oc logs <pod> -n openshift-sdn Regarding the url redirect. I need to look this up, but I had a similar issue. Check if all the hostnames are set correctly (always FQDN), and look in the /etc/origin/master/master-config.yaml file for the domain name of the api Regards, Nikolas ANUZET Wilfried <[email protected]> schrieb am Do. 18. Apr. 2019 um 11:09: > Hi Nikolas, > > > > Today I'll retry the installation from scratch, I made a snapshot of all > my VMs just after the installations of all prerequisites and before apply > all playbooks ;) > > > > Two things: > > - First: > > When I set the openshift_http(s)_proxy variable in my inventory I lost my > connection to the dockerhub registry -_-' > > I know that here our proxy is a complete mess (a long story…) so I > disabled theses variables and re-install and going back to the old situation > > The servers are accessible from the subnet they're on but not from another > subnet, I tried to access the console from a windows servers on the same > subnet and it was ok. Except the redirection from: > https://okdmst01.stluc.ucl.ac.be:8443 redirect to a short name like > https://okdmst01t:8443/console but I still can access and log in the okd > console. > > > > - Second: > > I monitored a basic ping to the master server from outside its subnet and > found the moment when I lost connection: > > thu avr 18 10:16:02 CEST 2019: 64 bytes from okdmst01t.stluc.ucl.ac.be > (10.244.246.66): icmp_seq=1044 ttl=63 time=0.684 ms > > > > At the same moment on journactl it seems that there is somes operations > with networkmanager: > > Apr 18 10:16:02 okdmst01t.stluc.ucl.ac.be sudo[27063]: > pam_unix(sudo:session): session closed for user root > > Apr 18 10:16:02 okdmst01t.stluc.ucl.ac.be sudo[27149]: aw1538 : > TTY=unknown ; PWD=/home/aw1538 ; USER=root ; COMMAND=/bin/sh -c echo > BECOME-SUCCESS-jqpitossiewjikeadherhxalutqivypc; /usr/bin/python > > Apr 18 10:16:02 okdmst01t.stluc.ucl.ac.be sudo[27149]: > pam_unix(sudo:session): session opened for user root by (uid=0) > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be kernel: device ovs-system > entered promiscuous mode > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info> > [1555575363.0222] manager: (ovs-system): new Generic device > (/org/freedesktop/NetworkManager/Devices/4) > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be python[27156]: ansible-stat > Invoked with checksum_algorithm=sha1 get_checksum=True follow=True > checksum_algo=sha1 path=/usr/share/openshift/examples/ get_md5=None > get_mime=True get_attributes=True > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be sudo[27149]: > pam_unix(sudo:session): session closed for user root > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be kernel: device br0 entered > promiscuous mode > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info> > [1555575363.0530] manager: (br0): new Generic device > (/org/freedesktop/NetworkManager/Devices/5) > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be kernel: device vxlan_sys_4789 > entered promiscuous mode > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info> > [1555575363.0931] manager: (vxlan_sys_4789): new Vxlan device > (/org/freedesktop/NetworkManager/Devices/6) > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info> > [1555575363.0933] device (vxlan_sys_4789): enslaved to non-master-type > device ovs-system; ignoring > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info> > [1555575363.1002] device (vxlan_sys_4789): state change: unmanaged -> > unavailable (reason 'connection-assumed', sys-iface-state: 'external') > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info> > [1555575363.1027] device (vxlan_sys_4789): enslaved to non-master-type > device ovs-system; ignoring > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info> > [1555575363.1040] device (vxlan_sys_4789): state change: unavailable -> > disconnected (reason 'none', sys-iface-state: 'external') > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be kernel: device tun0 entered > promiscuous mode > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info> > [1555575363.1180] manager: (tun0): new Generic device > (/org/freedesktop/NetworkManager/Devices/7) > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be NetworkManager[10228]: <info> > [1555575363.1326] device (tun0): carrier: link connected > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be sudo[27336]: aw1538 : > TTY=unknown ; PWD=/home/aw1538 ; USER=root ; COMMAND=/bin/sh -c echo > BECOME-SUCCESS-oqtjywlqnddthrxxoriwvlubusutuskn; /usr/bin/python > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be sudo[27336]: > pam_unix(sudo:session): session opened for user root by (uid=0) > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be kernel: ctnetlink v0.93: > registering with nfnetlink. > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be python[27350]: > ansible-unarchive Invoked with directory_mode=None force=None > remote_src=False exclude=[] owner=None follow=False group=None > unsafe_writes=None keep_newer=False setype=None > content=NOT_LOGGING_PARAMETER serole=None extra_opts=[] > dest=/usr/share/openshift/examples/ selevel=None regexp=None > src=/home/aw1538/.ansible/tmp/ansible-tmp-1555575362.6-29197543865924/source > validate_certs=True list_files=False seuser=None creates=None > delimiter=None mode=None attributes=None backup=None > > Apr 18 10:16:03 okdmst01t.stluc.ucl.ac.be sudo[27336]: > pam_unix(sudo:session): session closed for user root > > Apr 18 10:16:04 okdmst01t.stluc.ucl.ac.be dnsmasq[10085]: setting > upstream servers from DBus > > Apr 18 10:16:04 okdmst01t.stluc.ucl.ac.be dnsmasq[10085]: using > nameserver 10.97.200.151#53 > > Apr 18 10:16:04 okdmst01t.stluc.ucl.ac.be dnsmasq[10085]: using > nameserver 10.244.244.151#53 > > Apr 18 10:16:04 okdmst01t.stluc.ucl.ac.be dnsmasq[10085]: using > nameserver 127.0.0.1#53 for domain in-addr.arpa > > Apr 18 10:16:04 okdmst01t.stluc.ucl.ac.be dnsmasq[10085]: using > nameserver 127.0.0.1#53 for domain cluster.local > > > > At first sight it seems related with openshift SDN ? > > It happen a few seconds after the task [ openshift_sdn: Apply the config ] > > > > > > [image: logo-stluc] > > *Wilfried Anuzet* > Service Infrastructure > Département Information & Systèmes > Tél: +32 2 764 2488 > ------------------------------ > > Avenue Hippocrate, 10 > <https://maps.google.com/?q=Avenue+Hippocrate,+10&entry=gmail&source=g> - > 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be > > [image: logo-fsl] > > Soutenez les Cliniques, soutenez la Fondation Saint-Luc > <http://www.fondationsaintluc.be/> > Support our Hospital, support Fondation Saint-Luc > <http://www.fondationsaintluc.be/> > > > > > > *De :* Nikolas Philips <[email protected]> > *Envoyé :* mercredi 17 avril 2019 12:00 > > > *À :* ANUZET Wilfried <[email protected]> > *Cc :* OpenShift Users List <[email protected]> > *Objet :* Re: OKD installation on CentOS 7.6 > > > > Hi Wilfried, > > maybe you should define the proxy used by the system also in the inventory > file: > > > https://docs.okd.io/latest/install/configuring_inventory_file.html#advanced-install-configuring-global-proxy > > I don't think this causes the issue, but you should define them there > anyway. > > Keep me up to date when you get some news :) > > > > Regards, > Nikolas > > > > Am Mi., 17. Apr. 2019 um 11:46 Uhr schrieb ANUZET Wilfried < > [email protected]>: > > Hi Nikolas, > > > > When I restart origin-node.service I indeed lost the connection. > > I will check and monitor the firewall and services as you recommanded and > came back. > > > > I already use the ansible installer playbooks from github, branch > release-3.11. > > > > The main difference from a vanilla CentOS for theses severs are: > > - Add Red Hat Satellite subscription to use our internal Satellite as > CentOS masters repositories > > - Installation of somes packages (mostly debugging tools like net-utils, > iotop… And tools to integrate Active Diretory oodjob, adcli ...) > > - The proxy and proxy credentials were configured at profile level > > - NTP use our internal NTP servers > > - Security rules are applied to be compliant with SCAP content "Standard > System Security Profile for Red Hat Enterprise Linux 7", it's mostly > auditing rules and few tules to disable root login via ssh and prevent > empty password login > > > > Thanks > > > > > > [image: logo-stluc] > > *Wilfried Anuzet* > Service Infrastructure > Département Information & Systèmes > Tél: +32 2 764 2488 > ------------------------------ > > Avenue Hippocrate, 10 > <https://maps.google.com/?q=Avenue+Hippocrate,+10&entry=gmail&source=g> - > 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be > > [image: logo-fsl] > > Soutenez les Cliniques, soutenez la Fondation Saint-Luc > <http://www.fondationsaintluc.be/> > Support our Hospital, support Fondation Saint-Luc > <http://www.fondationsaintluc.be/> > > > > > > *De :* Nikolas Philips <[email protected]> > *Envoyé :* mercredi 17 avril 2019 11:18 > *À :* ANUZET Wilfried <[email protected]> > *Cc :* OpenShift Users List <[email protected]> > *Objet :* Re: OKD installation on CentOS 7.6 > > > > Hi Wilfred, > > just as some input: When you can access your node while origin-node isn't > running/disable, what happens when you start docker and origin-node? The > access should go down I guess. That way you should be able to track down > the process causing the issue. For example set up an external port check on > 22, and log ps -ef / netstat -tupln / docker ps / journalctl / iptables -L > frequently to get the time/process when the node gets unavailable. > > OKD does not limit the network access based on subnet or smiliar. So this > behaviour is an unwanted side effect caused by the environment (network, > sysconfig, ext. firewall etc.). What is different from a vanilla CentOS > installation? Are there any routines while starting the node? Maybe it's an > issue of services going up in the wrong order. I wouldn't study the ansible > installer itself, as it seems to be working correctly. Try to find the > exact moment/process, when the access gets denied. > > And are you using the openshift-installer from github or from the CentOS > repository? The RPMs from the CentOS repo are not that well updated, so > maybe try the openshift-installer from github (branch release-3.11) and use > the playbooks from there. Sometimes there are relevant bug fixes included. > > > > Regards, > Nikolas > > > > Am Mi., 17. Apr. 2019 um 10:44 Uhr schrieb ANUZET Wilfried < > [email protected]>: > > Hi Nikolas, > > > > I just ask the netwok team here to see with them if there's something that > block OKD at network level and it seems not. > > > > And since you can access the servers only from certain hosts after the > installation really looks like an external component breaks somethings. > > Because you have a short window to access the server from your client, I'm > pretty sure it's not a local firewalld issue, as network/firewall go up > together. So a different service is causing the issue. I would try to > identify this processes, until it's clear what component issues that > behaviour. > > To me as well the issue seems related to an openshift component as the > server is inaccessible when OKD start. I'll try to identify which one … > > > > I asked you once for the wrong nodes. Is dnsmasq running on the LB node? > > I just checked and DNSMasq is not running on the LB. > > > > You could maybe verify that with stopping services origin-node and docker > and try to get rid of all openshift specific processes (also dnsmasq), so > only basic services are running (or disabling and reboot). > > Stop origin-node.service and docker.service units and nothing changed. > > disable origin-node.service and docker.service and reboot and the node > server is accessible from outside it's subnet. > > The issue seems clearly related to OKD ;) > > > > On the LB using a cli browser (lynx) I can access to the master URL ( > https://okdmst01t.stluc.ucl.ac.be:8443 which redirect correctly to > https://okdmst01t:8443/console/ = but obviously there a mention to > activate javascript on the login page ). > > I just saw that I forgot to put the okd master / node IP in the /etc/hosts > of the LB. > > I just add them but it change nothing. > > > > I'm also out of idea but I will check every OKD pods and better read the > openshift installer (but as it's well wrtitten it's also insanely > imbricated with a lot of import_playbook, import_tasks …) > > > > :'( > > > > [image: logo-stluc] > > *Wilfried Anuzet* > Service Infrastructure > Département Information & Systèmes > Tél: +32 2 764 2488 > ------------------------------ > > Avenue Hippocrate, 10 > <https://maps.google.com/?q=Avenue+Hippocrate,+10&entry=gmail&source=g> - > 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be > > [image: logo-fsl] > > Soutenez les Cliniques, soutenez la Fondation Saint-Luc > <http://www.fondationsaintluc.be/> > Support our Hospital, support Fondation Saint-Luc > <http://www.fondationsaintluc.be/> > > > > > > *De :* Nikolas Philips <[email protected]> > *Envoyé :* mercredi 17 avril 2019 09:48 > *À :* ANUZET Wilfried <[email protected]> > *Cc :* OpenShift Users List <[email protected]> > *Objet :* Re: OKD installation on CentOS 7.6 > > > > Hi Wilfried, > > sadly I'm a bit out of ideas what could cause this issue. All the settings > and configs I saw from you were looking good. > > And since you can access the servers only from certain hosts after the > installation really looks like an external component breaks somethings. > > My guess would be that maybe an external/internal firewall blocks external > traffic to your nodes when certain ports are open (or similar). Maybe > because of DNS to prevent spoofing? (I asked you once for the wrong nodes. > Is dnsmasq running on the LB node?) > > You could maybe verify that with stopping services origin-node and docker > and try to get rid of all openshift specific processes (also dnsmasq), so > only basic services are running (or disabling and reboot). > > Because you have a short window to access the server from your client, I'm > pretty sure it's not a local firewalld issue, as network/firewall go up > together. So a different service is causing the issue. I would try to > identify this processes, until it's clear what component issues that > behaviour. > > > > But you can access the cluster through the LB (e.g. 8443 or 443), right? > > > > Regards, > > Nikolas > > > > Am Mi., 17. Apr. 2019 um 09:12 Uhr schrieb ANUZET Wilfried < > [email protected]>: > > Hello Nikola, > > > > Here the output of the firewall-cmd command on the LB and master: > > LB: > > public (active) > > target: default > > icmp-block-inversion: no > > interfaces: ens192 > > sources: > > services: ssh dhcpv6-client > > ports: 10250/tcp 10256/tcp 80/tcp 443/tcp 4789/udp 9000-10000/tcp > 1936/tcp > > protocols: > > masquerade: no > > forward-ports: > > source-ports: > > icmp-blocks: > > rich rules: > > > > MASTER: > > public (active) > > target: default > > icmp-block-inversion: no > > interfaces: ens192 > > sources: > > services: ssh dhcpv6-client > > ports: 10250/tcp 10256/tcp 80/tcp 443/tcp 4789/udp 9000-10000/tcp > 1936/tcp 2379/tcp 2380/tcp 9000/tcp 8443/tcp 8444/tcp 8053/tcp 8053/udp > > protocols: > > masquerade: no > > forward-ports: > > source-ports: > > icmp-blocks: > > rich rules: > > > > > > [image: logo-stluc] > > *Wilfried Anuzet* > Service Infrastructure > Département Information & Systèmes > Tél: +32 2 764 2488 > ------------------------------ > > Avenue Hippocrate, 10 > <https://maps.google.com/?q=Avenue+Hippocrate,+10&entry=gmail&source=g> - > 1200 Bruxelles - Belgique - Tel: + 32 2 764 11 11 - www.saintluc.be > > [image: logo-fsl] > > Soutenez les Cliniques, soutenez la Fondation Saint-Luc > <http://www.fondationsaintluc.be/> > Support our Hospital, support Fondation Saint-Luc > <http://www.fondationsaintluc.be/> > > > > > > *De :* Nikolas Philips <[email protected]> > *Envoyé :* mardi 16 avril 2019 19:17 > *À :* ANUZET Wilfried <[email protected]> > *Cc :* OpenShift Users List <[email protected]> > *Objet :* Re: OKD installation on CentOS 7.6 > > > > Sorry Wilfried, > > I missed the line with "os_firewall_use_firewalld" in your inventory file. > > What's the output of "firewall-cmd --list-all" on the LB and master? > > > > > > Am Di., 16. Apr. 2019 um 17:52 Uhr schrieb ANUZET Wilfried < > [email protected]>: > > Thanks Nikolas; > > > > Here some answer to better identify the source problem: > > > > · I can connect via ssh before running the ansible installer, I > run another ansible playbook before to be compliant wit our enterprise > policy > > In this playbook I just ensure that firewalld is up an running but I keep > the default value (just ssh service open and icmp response not blocked.) > > If I uninstall Openshift and reboot the server I can connect to it again. > > > > · All of these servers have only one NIC > > > > · I tried to disable firewalld and flush all iptables rules but > stil can't join the server > > /!\ I just see that I can join the server with another server in the same > subnet without deactivate and flush the firewall /!\ > > > > · Connected on one node: > > disable origin node via systemd => still no connection > > add ssh port and icmp in iptable => still no connection or icmp response > > it seems that kubernets recreate some rules (via the pods/ docker > container which are still running ? do I have to stop them all via docker > container stop $(docker container ls -q) ?) > > > > · Here the information about one node > > ip a sh > > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group > default qlen 1000 > > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > > inet 127.0.0.1/8 scope host lo > > valid_lft forever preferred_lft forever > > inet6 ::1/128 scope host > > valid_lft forever preferred_lft forever > > 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP > group default qlen 1000 > > link/ether 00:50:56:92:79:03 brd ff:ff:ff:ff:ff:ff > > inet 10.244.246.68/24 brd 10.244.246.255 scope global noprefixroute > ens192 > > valid_lft forever preferred_lft forever > > inet6 fe80::250:56ff:fe92:7903/64 scope link > > valid_lft forever preferred_lft forever > > 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue > state DOWN group default > > link/ether 02:42:cb:3e:8f:86 brd ff:ff:ff:ff:ff:ff > > inet 172.17.0.1/16 scope global docker0 > > valid_lft forever preferred_lft forever > > 4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group > default qlen 1000 > > link/ether 82:94:30:55:98:12 brd ff:ff:ff:ff:ff:ff > > 5: br0: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN group default > qlen 1000 > > link/ether ee:73:3d:25:b7:48 brd ff:ff:ff:ff:ff:ff > > 6: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc > noqueue master ovs-system state UNKNOWN group default qlen 1000 > > link/ether 5a:63:33:de:9f:70 brd ff:ff:ff:ff:ff:ff > > inet6 fe80::5863:33ff:fede:9f70/64 scope link > > valid_lft forever preferred_lft forever > > 7: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state > UNKNOWN group default qlen 1000 > > link/ether b6:35:b5:77:d4:60 brd ff:ff:ff:ff:ff:ff > > inet 10.131.0.1/23 brd 10.131.1.255 scope global tun0 > > valid_lft forever preferred_lft forever > > inet6 fe80::b435:b5ff:fe77:d460/64 scope link > > valid_lft forever preferred_lft forever > > > > netstat > > Active Internet connections (only servers) > > Proto Recv-Q Send-Q Local Address Foreign Address > State PID/Program name > > tcp 0 0 127.0.0.1:9101 0.0.0.0:* > LISTEN 16787/node_exporter > > tcp 0 0 0.0.0.0:111 0.0.0.0:* > LISTEN 1/systemd > > tcp 0 0 127.0.0.1:53 0.0.0.0:* > LISTEN 13155/openshift > > tcp 0 0 10.131.0.1:53 0.0.0.0:* > LISTEN 9666/dnsmasq > > tcp 0 0 10.244.246.68:53 0.0.0.0:* > LISTEN 9666/dnsmasq > > tcp 0 0 172.17.0.1:53 0.0.0.0:* > LISTEN 9666/dnsmasq > > tcp 0 0 0.0.0.0:22 0.0.0.0:* > LISTEN 6515/sshd > > tcp 0 0 127.0.0.1:11256 0.0.0.0:* > LISTEN 13155/openshift > > tcp 0 0 127.0.0.1:25 0.0.0.0:* > LISTEN 6762/master > > tcp6 0 0 :::9100 :::* > LISTEN 16837/./kube-rbac-p > > tcp6 0 0 :::111 :::* > LISTEN 1/systemd > > tcp6 0 0 :::10256 :::* > LISTEN 13155/openshift > > tcp6 0 0 fe80::5863:33ff:fede:53 :::* > LISTEN 9666/dnsmasq > > tcp6 0 0 fe80::b435:b5ff:fe77:53 :::* > LISTEN 9666/dnsmasq > > tcp6 0 0 fe80::250:56ff:fe92::53 :::* > LISTEN 9666/dnsmasq > > tcp6 0 0 :::22 :::* > LISTEN 6515/sshd > > tcp6 0 0 ::1:25 :::* > LISTEN 6762/master > > udp 0 0 127.0.0.1:53 0.0.0.0:* > 13155/openshift > > udp 0 0 10.131.0.1:53 0.0.0.0:* > 9666/dnsmasq > > udp 0 0 10.244.246.68:53 0.0.0.0:* > 9666/dnsmasq > > udp 0 0 172.17.0.1:53 0.0.0.0:* > 9666/dnsmasq > > udp 0 0 0.0.0.0:111 0.0.0.0:* > 1/systemd > > udp 0 0 127.0.0.1:323 0.0.0.0:* > 5855/chronyd > > udp 0 0 0.0.0.0:4789 0.0.0.0:* > - > > udp 0 0 0.0.0.0:922 0.0.0.0:* > 5857/rpcbind > > udp6 0 0 fe80::5863:33ff:fede:53 :::* > 9666/dnsmasq > > udp6 0 0 fe80::b435:b5ff:fe77:53 > :::* 9666/dnsmasq > > udp6 0 0 fe80::250:56ff:fe92::53 > :::* 9666/dnsmasq > > udp6 0 0 :::111 > :::* 1/systemd > > udp6 0 0 ::1:323 > :::* 5855/chronyd > > udp6 0 0 :::4789 > :::* - > > udp6 0 0 :::922 > :::* 5857/rpcbind > >
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
