Thanks Clayton, I’ll take a look a closer look next week because the solution seems to be fixing the symptoms, not the cause and I’d like to get to a stage where we don’t need to patch the installation and restart-it.
This happens pretty much *every time* that I install 3.7 or 3.9 on AWS and a significant number of times on OpenStack. Has this been reported by others because it’s so common that we can't be the only ones seeing this? Alan > On 13 Apr 2018, at 21:35, Clayton Coleman <[email protected]> wrote: > > Can not find allocated subnet usually means the master didn’t hand out a > chunk of SDN IPs to that node. Check the master’s origin-master-controller > logs and look for anything that relates to the node name mentioned in your > error. If you see a problem, try restarting the origin-master-controllers > processes on all nodes. > > On Apr 13, 2018, at 2:26 PM, Alan Christie <[email protected] > <mailto:[email protected]>> wrote: > >> What’s wrong with the post-3.6 OpenShift/Origin release? >> >> I build my cluster with Terraform and OpenShift 3.6 (on AWS) is wonderfully >> stable and I have no problem creating clusters. But, with both 3.7 and 3.9, >> I just cannot start a cluster without encountering at least one node with an >> empty /etc/cni/net.d. >> >> This applies to 3.7 and 3.9 on AWS and two OpenStack providers. In all cases >> the Ansible installer enters the "RUNNING HANDLER [openshift_node : restart >> node]" task but this, for the vast majority of installations on OpenStack >> and every single attempt in AWS, always fails. I’m worried that I’ve got >> something clearly very wrong and have had to return to 3.6 to get anything >> done. >> >> RUNNING HANDLER [openshift_node : restart openvswitch] >> ******************************************************************************** >> Friday 13 April 2018 13:19:09 +0100 (0:00:00.062) 0:09:28.744 >> ********** >> changed: [18.195.236.210] >> changed: [18.195.126.190] >> changed: [18.184.65.88] >> >> RUNNING HANDLER [openshift_node : restart openvswitch pause] >> ************************************************************************** >> Friday 13 April 2018 13:19:09 +0100 (0:00:00.720) 0:09:29.464 >> ********** >> skipping: [18.195.236.210] >> >> RUNNING HANDLER [openshift_node : restart node] >> *************************************************************************************** >> Friday 13 April 2018 13:19:09 +0100 (0:00:00.036) 0:09:29.501 >> ********** >> FAILED - RETRYING: restart node (3 retries left). >> FAILED - RETRYING: restart node (3 retries left). >> FAILED - RETRYING: restart node (3 retries left). >> FAILED - RETRYING: restart node (2 retries left). >> FAILED - RETRYING: restart node (2 retries left). >> FAILED - RETRYING: restart node (2 retries left). >> FAILED - RETRYING: restart node (1 retries left). >> FAILED - RETRYING: restart node (1 retries left). >> FAILED - RETRYING: restart node (1 retries left). >> fatal: [18.195.236.210]: FAILED! => {"attempts": 3, "changed": false, "msg": >> "Unable to restart service origin-node: Job for origin-node.service failed >> because the control process exited with error code. See \"systemctl status >> origin-node.service\" and \"journalctl -xe\" for details.\n"} >> fatal: [18.195.126.190]: FAILED! => {"attempts": 3, "changed": false, "msg": >> "Unable to restart service origin-node: Job for origin-node.service failed >> because the control process exited with error code. See \"systemctl status >> origin-node.service\" and \"journalctl -xe\" for details.\n"} >> fatal: [18.184.65.88]: FAILED! => {"attempts": 3, "changed": false, "msg": >> "Unable to restart service origin-node: Job for origin-node.service failed >> because the control process exited with error code. See \"systemctl status >> origin-node.service\" and \"journalctl -xe\" for details.\n"} >> >> When I jump onto a suspect node after the failure I find/etc/cni/net.d is >> empty and the journal contains the message "No networks found in >> /etc/cni/net.d”... >> >> -- The start-up result is done. >> Apr 13 12:23:44 ip-10-0-0-61.eu >> <http://ip-10-0-0-61.eu/>-central-1.compute.internal >> origin-master-controllers[26728]: I0413 12:23:44.850154 26728 >> leaderelection.go:179] attempting to acquire leader lease... >> Apr 13 12:23:44 ip-10-0-0-61.eu >> <http://ip-10-0-0-61.eu/>-central-1.compute.internal origin-node[26683]: >> W0413 12:23:44.933963 26683 cni.go:189] Unable to update cni config: No >> networks found in /etc/cni/net.d >> Apr 13 12:23:44 ip-10-0-0-61.eu >> <http://ip-10-0-0-61.eu/>-central-1.compute.internal origin-node[26683]: >> E0413 12:23:44.934447 26683 kubelet.go:2112] Container runtime network not >> ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: >> network plugin is not ready: cni config uninitialized >> Apr 13 12:23:47 ip-10-0-0-61.eu >> <http://ip-10-0-0-61.eu/>-central-1.compute.internal origin-node[26683]: >> W0413 12:23:47.947200 26683 sdn_controller.go:48] Could not find an >> allocated subnet for node: ip-10-0-0-61.eu >> <http://ip-10-0-0-61.eu/>-central-1.compute.internal, Waiting... >> >> Is anyone else seeing this and, more importantly, is there a clear cause and >> solution? >> >> I cannot start 3.7 and have been tinkering with it for days on AWS at all >> and on OpenStack 3 out of 4 attempts fail. I just tried 3.9 to find the same >> failure on AWS and have just given up and returned to the wonderfully stable >> 3.6. >> >> Alan Christie >> >> >> >> _______________________________________________ >> users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users>
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
