On Sep 8, 2016 7:06 PM, "Isaac Christoffersen" <[email protected]> wrote: > > I'm running Origin in AWS and after adding some shared EFS volumes to the node instances, the nodes seem to be unable to rejoin the cluster. > > It's a 3 Master + ETCD setup with 4 application Nodes. An 'oc get nodes' returns an empty list and of course, none of the pods will start. > > > Various error messages that I see that are relevant are: > > "Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: instance not found > "Could not find an allocated subnet for node: ip-10-0-37-217..... , Waiting..." > > and > > ""Error updating node status, will retry: error getting node "ip-10-0-37-217....": nodes "ip-10-0-37-217...." not found" > > > Any insights into how to start troubleshooting further. I'm baffled.
Did the nodes come back up with a new IP address? If so, the internal DNS name would have also changed and the node would need to be reconfigured accordingly. Items that would need to be updated: - node name in the node config - node serving certificate There is an Ansible playbook that can automate the redeployment of certificates as well (playbooks/byo/openshift-cluster/redeploy-certificates.yml). -- Jason DeTiberus
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
