net.d with Ansible installer on 3.7 and 3.9

Alan Christie Sat, 14 Apr 2018 01:31:15 -0700

Thanks Clayton,

I’ll take a look a closer look next week because the solution seems to be 
fixing the symptoms, not the cause and I’d like to get to a stage where we 
don’t need to patch the installation and restart-it.


This happens pretty much *every time* that I install 3.7 or 3.9 on AWS and a 
significant number of times on OpenStack.

Has this been reported by others because it’s so common that we can't be the 
only ones seeing this?

Alan


> On 13 Apr 2018, at 21:35, Clayton Coleman <[email protected]> wrote:
> 
> Can not find allocated subnet usually means the master didn’t hand out a 
> chunk of SDN IPs to that node.  Check the master’s origin-master-controller 
> logs and look for anything that relates to the node name mentioned in your 
> error.  If you see a problem, try restarting the origin-master-controllers 
> processes on all nodes.
> 
> On Apr 13, 2018, at 2:26 PM, Alan Christie <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> What’s wrong with the post-3.6 OpenShift/Origin release?
>> 
>> I build my cluster with Terraform and OpenShift 3.6 (on AWS) is wonderfully 
>> stable and I have no problem creating clusters. But, with both 3.7 and 3.9, 
>> I just cannot start a cluster without encountering at least one node with an 
>> empty /etc/cni/net.d.
>> 
>> This applies to 3.7 and 3.9 on AWS and two OpenStack providers. In all cases 
>> the Ansible installer enters the "RUNNING HANDLER [openshift_node : restart 
>> node]" task but this, for the vast majority of installations on OpenStack 
>> and every single attempt in AWS, always fails. I’m worried that I’ve got 
>> something clearly very wrong and have had to return to 3.6 to get anything 
>> done.
>> 
>> RUNNING HANDLER [openshift_node : restart openvswitch] 
>> ********************************************************************************
>> Friday 13 April 2018  13:19:09 +0100 (0:00:00.062)       0:09:28.744 
>> ********** 
>> changed: [18.195.236.210]
>> changed: [18.195.126.190]
>> changed: [18.184.65.88]
>> 
>> RUNNING HANDLER [openshift_node : restart openvswitch pause] 
>> **************************************************************************
>> Friday 13 April 2018  13:19:09 +0100 (0:00:00.720)       0:09:29.464 
>> ********** 
>> skipping: [18.195.236.210]
>> 
>> RUNNING HANDLER [openshift_node : restart node] 
>> ***************************************************************************************
>> Friday 13 April 2018  13:19:09 +0100 (0:00:00.036)       0:09:29.501 
>> ********** 
>> FAILED - RETRYING: restart node (3 retries left).
>> FAILED - RETRYING: restart node (3 retries left).
>> FAILED - RETRYING: restart node (3 retries left).
>> FAILED - RETRYING: restart node (2 retries left).
>> FAILED - RETRYING: restart node (2 retries left).
>> FAILED - RETRYING: restart node (2 retries left).
>> FAILED - RETRYING: restart node (1 retries left).
>> FAILED - RETRYING: restart node (1 retries left).
>> FAILED - RETRYING: restart node (1 retries left).
>> fatal: [18.195.236.210]: FAILED! => {"attempts": 3, "changed": false, "msg": 
>> "Unable to restart service origin-node: Job for origin-node.service failed 
>> because the control process exited with error code. See \"systemctl status 
>> origin-node.service\" and \"journalctl -xe\" for details.\n"}
>> fatal: [18.195.126.190]: FAILED! => {"attempts": 3, "changed": false, "msg": 
>> "Unable to restart service origin-node: Job for origin-node.service failed 
>> because the control process exited with error code. See \"systemctl status 
>> origin-node.service\" and \"journalctl -xe\" for details.\n"}
>> fatal: [18.184.65.88]: FAILED! => {"attempts": 3, "changed": false, "msg": 
>> "Unable to restart service origin-node: Job for origin-node.service failed 
>> because the control process exited with error code. See \"systemctl status 
>> origin-node.service\" and \"journalctl -xe\" for details.\n"}
>> 
>> When I jump onto a suspect node after the failure I find/etc/cni/net.d is 
>> empty and the journal contains the message "No networks found in 
>> /etc/cni/net.d”...
>> 
>> -- The start-up result is done.
>> Apr 13 12:23:44 ip-10-0-0-61.eu 
>> <http://ip-10-0-0-61.eu/>-central-1.compute.internal 
>> origin-master-controllers[26728]: I0413 12:23:44.850154   26728 
>> leaderelection.go:179] attempting to acquire leader lease...
>> Apr 13 12:23:44 ip-10-0-0-61.eu 
>> <http://ip-10-0-0-61.eu/>-central-1.compute.internal origin-node[26683]: 
>> W0413 12:23:44.933963   26683 cni.go:189] Unable to update cni config: No 
>> networks found in /etc/cni/net.d
>> Apr 13 12:23:44 ip-10-0-0-61.eu 
>> <http://ip-10-0-0-61.eu/>-central-1.compute.internal origin-node[26683]: 
>> E0413 12:23:44.934447   26683 kubelet.go:2112] Container runtime network not 
>> ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: 
>> network plugin is not ready: cni config uninitialized
>> Apr 13 12:23:47 ip-10-0-0-61.eu 
>> <http://ip-10-0-0-61.eu/>-central-1.compute.internal origin-node[26683]: 
>> W0413 12:23:47.947200   26683 sdn_controller.go:48] Could not find an 
>> allocated subnet for node: ip-10-0-0-61.eu 
>> <http://ip-10-0-0-61.eu/>-central-1.compute.internal, Waiting...
>> 
>> Is anyone else seeing this and, more importantly, is there a clear cause and 
>> solution?
>> 
>> I cannot start 3.7 and have been tinkering with it for days on AWS at all 
>> and on OpenStack 3 out of 4 attempts fail. I just tried 3.9 to find the same 
>> failure on AWS and have just given up and returned to the wonderfully stable 
>> 3.6.
>> 
>> Alan Christie
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> [email protected] <mailto:[email protected]>
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users 
>> <http://lists.openshift.redhat.com/openshiftmm/listinfo/users>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Empty /etc/cni/net.d with Ansible installer on 3.7 and 3.9

Reply via email to