What’s wrong with the post-3.6 OpenShift/Origin release?

I build my cluster with Terraform and OpenShift 3.6 (on AWS) is wonderfully 
stable and I have no problem creating clusters. But, with both 3.7 and 3.9, I 
just cannot start a cluster without encountering at least one node with an 
empty /etc/cni/net.d.

This applies to 3.7 and 3.9 on AWS and two OpenStack providers. In all cases 
the Ansible installer enters the "RUNNING HANDLER [openshift_node : restart 
node]" task but this, for the vast majority of installations on OpenStack and 
every single attempt in AWS, always fails. I’m worried that I’ve got something 
clearly very wrong and have had to return to 3.6 to get anything done.

RUNNING HANDLER [openshift_node : restart openvswitch] 
********************************************************************************
Friday 13 April 2018  13:19:09 +0100 (0:00:00.062)       0:09:28.744 ********** 
changed: [18.195.236.210]
changed: [18.195.126.190]
changed: [18.184.65.88]

RUNNING HANDLER [openshift_node : restart openvswitch pause] 
**************************************************************************
Friday 13 April 2018  13:19:09 +0100 (0:00:00.720)       0:09:29.464 ********** 
skipping: [18.195.236.210]

RUNNING HANDLER [openshift_node : restart node] 
***************************************************************************************
Friday 13 April 2018  13:19:09 +0100 (0:00:00.036)       0:09:29.501 ********** 
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (3 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (2 retries left).
FAILED - RETRYING: restart node (1 retries left).
FAILED - RETRYING: restart node (1 retries left).
FAILED - RETRYING: restart node (1 retries left).
fatal: [18.195.236.210]: FAILED! => {"attempts": 3, "changed": false, "msg": 
"Unable to restart service origin-node: Job for origin-node.service failed 
because the control process exited with error code. See \"systemctl status 
origin-node.service\" and \"journalctl -xe\" for details.\n"}
fatal: [18.195.126.190]: FAILED! => {"attempts": 3, "changed": false, "msg": 
"Unable to restart service origin-node: Job for origin-node.service failed 
because the control process exited with error code. See \"systemctl status 
origin-node.service\" and \"journalctl -xe\" for details.\n"}
fatal: [18.184.65.88]: FAILED! => {"attempts": 3, "changed": false, "msg": 
"Unable to restart service origin-node: Job for origin-node.service failed 
because the control process exited with error code. See \"systemctl status 
origin-node.service\" and \"journalctl -xe\" for details.\n"}

When I jump onto a suspect node after the failure I find/etc/cni/net.d is empty 
and the journal contains the message "No networks found in /etc/cni/net.d”...

-- The start-up result is done.
Apr 13 12:23:44 ip-10-0-0-61.eu-central-1.compute.internal 
origin-master-controllers[26728]: I0413 12:23:44.850154   26728 
leaderelection.go:179] attempting to acquire leader lease...
Apr 13 12:23:44 ip-10-0-0-61.eu-central-1.compute.internal origin-node[26683]: 
W0413 12:23:44.933963   26683 cni.go:189] Unable to update cni config: No 
networks found in /etc/cni/net.d
Apr 13 12:23:44 ip-10-0-0-61.eu-central-1.compute.internal origin-node[26683]: 
E0413 12:23:44.934447   26683 kubelet.go:2112] Container runtime network not 
ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network 
plugin is not ready: cni config uninitialized
Apr 13 12:23:47 ip-10-0-0-61.eu-central-1.compute.internal origin-node[26683]: 
W0413 12:23:47.947200   26683 sdn_controller.go:48] Could not find an allocated 
subnet for node: ip-10-0-0-61.eu-central-1.compute.internal, Waiting...

Is anyone else seeing this and, more importantly, is there a clear cause and 
solution?

I cannot start 3.7 and have been tinkering with it for days on AWS at all and 
on OpenStack 3 out of 4 attempts fail. I just tried 3.9 to find the same 
failure on AWS and have just given up and returned to the wonderfully stable 
3.6.

Alan Christie



_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to