So the hostnames did not change and after rolling back to just the BYO configuration and removing the AWS settings, I was able to get back up and running. This means that the certificates were good as well.
I lost the ability to use EBS volumes doing this, but we in the process of using EFS anyway. I suspect the issue is tied up in the fact that these node names have multiple aliases and have a different local hostname then they do in the EC2 console. However, I'm not why this manifested itself after running successfully for 4 weeks. Either way, I'm moving on with just BYO. thanks, Isaac Isaac Christoffersen <https://www.linkedin.com/in/ichristo>, Technical Director w: 703.318.7800 x8202 | m: 703.980.2836 | @ichristo <http://twitter.com/ichristo> Vizuri, a division of AEM Corporation 13880 Dulles Corner Lane # 300 Herndon, Virginia 20171 www.vizuri.com | @1Vizuri <http://twitter.com/1Vizuri> On Thu, Sep 8, 2016 at 10:36 PM, Isaac Christoffersen < [email protected]> wrote: > No, the hostnames are the same. Because I was getting the "external Id > from Cloud provider" error, I disabled the AWS configuration settings and > left it as solely a BYO. > > This allowed me to get my nodes back up. There's definitely something > with the AWS cloud provider settings and how instance names for nodes are > being found. > > I only need the AWS config for EBS storage for Persistence Volumes, so I > can't fully disable it the AWS settings. > > How does the external id lookup work? Can I verify the settings it > expects? > > Isaac Christoffersen <https://www.linkedin.com/in/ichristo>, Technical > Director > w: 703.318.7800 x8202 | m: 703.980.2836 | @ichristo > <http://twitter.com/ichristo> > > Vizuri, a division of AEM Corporation > 13880 Dulles Corner Lane # 300 > Herndon, Virginia 20171 > www.vizuri.com | @1Vizuri <http://twitter.com/1Vizuri> > > > On Thu, Sep 8, 2016 at 9:24 PM, Jason DeTiberus <[email protected]> > wrote: > >> On Sep 8, 2016 7:06 PM, "Isaac Christoffersen" < >> [email protected]> wrote: >> > >> > I'm running Origin in AWS and after adding some shared EFS volumes to >> the node instances, the nodes seem to be unable to rejoin the cluster. >> > >> > It's a 3 Master + ETCD setup with 4 application Nodes. An 'oc get >> nodes' returns an empty list and of course, none of the pods will start. >> > >> > >> > Various error messages that I see that are relevant are: >> > >> > "Unable to construct api.Node object for kubelet: failed to get >> external ID from cloud provider: instance not found >> > "Could not find an allocated subnet for node: ip-10-0-37-217..... , >> Waiting..." >> > >> > and >> > >> > ""Error updating node status, will retry: error getting node >> "ip-10-0-37-217....": nodes "ip-10-0-37-217...." not found" >> > >> > >> > Any insights into how to start troubleshooting further. I'm baffled. >> >> Did the nodes come back up with a new IP address? If so, the internal DNS >> name would have also changed and the node would need to be reconfigured >> accordingly. >> >> Items that would need to be updated: >> - node name in the node config >> - node serving certificate >> >> There is an Ansible playbook that can automate the redeployment of >> certificates as well (playbooks/byo/openshift-clust >> er/redeploy-certificates.yml). >> >> -- >> Jason DeTiberus >> > >
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
