On Sep 8, 2016 7:06 PM, "Isaac Christoffersen" <[email protected]>
wrote:
>
> I'm running Origin in AWS and after adding some shared EFS volumes to the
node instances, the nodes seem to be unable to rejoin the cluster.
>
> It's a 3 Master + ETCD setup with 4 application Nodes.  An 'oc get nodes'
returns an empty list and of course, none of the pods will start.
>
>
> Various error messages that I see that are relevant are:
>
> "Unable to construct api.Node object for kubelet: failed to get external
ID from cloud provider: instance not found
> "Could not find an allocated subnet for node: ip-10-0-37-217..... ,
Waiting..."
>
> and
>
> ""Error updating node status, will retry: error getting node
"ip-10-0-37-217....": nodes "ip-10-0-37-217...." not found"
>
>
> Any insights into how to start troubleshooting further.  I'm baffled.

Did the nodes come back up with a new IP address? If so, the internal DNS
name would have also changed and the node would need to be reconfigured
accordingly.

Items that would need to be updated:
- node name in the node config
- node serving certificate

There is an Ansible playbook that can automate the redeployment of
certificates as well
(playbooks/byo/openshift-cluster/redeploy-certificates.yml).

--
Jason DeTiberus
_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to