Re: Missing OpenShift Nodes - Unable to Join Cluster

Isaac Christoffersen Fri, 09 Sep 2016 07:22:38 -0700

So the hostnames did not change and after rolling back to just the BYO
configuration and removing the AWS settings, I was able to get back up and
running.  This means that the certificates were good as well.


I lost the ability to use EBS volumes doing this, but we in the process of
using EFS anyway.

I suspect the issue is tied up in the fact that these node names have
multiple aliases and have a different local hostname then they do in the
EC2 console.  However, I'm not why this manifested itself after running
successfully for 4 weeks.

Either way, I'm moving on with just BYO.

thanks,

Isaac

Isaac Christoffersen <https://www.linkedin.com/in/ichristo>, Technical
Director
w: 703.318.7800 x8202 | m: 703.980.2836 | @ichristo
<http://twitter.com/ichristo>

Vizuri, a division of AEM Corporation
13880 Dulles Corner Lane # 300
Herndon, Virginia 20171
www.vizuri.com | @1Vizuri <http://twitter.com/1Vizuri>


On Thu, Sep 8, 2016 at 10:36 PM, Isaac Christoffersen <
[email protected]> wrote:

> No, the hostnames are the same.  Because I was getting the "external Id
> from Cloud provider" error, I disabled the AWS configuration settings and
> left it as solely a BYO.
>
> This allowed me to get my nodes back up.  There's definitely something
> with the AWS cloud provider settings and how instance names for nodes are
> being found.
>
> I only need the AWS config for EBS storage for Persistence Volumes, so I
> can't fully disable it the AWS settings.
>
> How does the external id lookup work?  Can I verify the settings it
> expects?
>
> Isaac Christoffersen <https://www.linkedin.com/in/ichristo>, Technical
> Director
> w: 703.318.7800 x8202 | m: 703.980.2836 | @ichristo
> <http://twitter.com/ichristo>
>
> Vizuri, a division of AEM Corporation
> 13880 Dulles Corner Lane # 300
> Herndon, Virginia 20171
> www.vizuri.com | @1Vizuri <http://twitter.com/1Vizuri>
>
>
> On Thu, Sep 8, 2016 at 9:24 PM, Jason DeTiberus <[email protected]>
> wrote:
>
>> On Sep 8, 2016 7:06 PM, "Isaac Christoffersen" <
>> [email protected]> wrote:
>> >
>> > I'm running Origin in AWS and after adding some shared EFS volumes to
>> the node instances, the nodes seem to be unable to rejoin the cluster.
>> >
>> > It's a 3 Master + ETCD setup with 4 application Nodes.  An 'oc get
>> nodes' returns an empty list and of course, none of the pods will start.
>> >
>> >
>> > Various error messages that I see that are relevant are:
>> >
>> > "Unable to construct api.Node object for kubelet: failed to get
>> external ID from cloud provider: instance not found
>> > "Could not find an allocated subnet for node: ip-10-0-37-217..... ,
>> Waiting..."
>> >
>> > and
>> >
>> > ""Error updating node status, will retry: error getting node
>> "ip-10-0-37-217....": nodes "ip-10-0-37-217...." not found"
>> >
>> >
>> > Any insights into how to start troubleshooting further.  I'm baffled.
>>
>> Did the nodes come back up with a new IP address? If so, the internal DNS
>> name would have also changed and the node would need to be reconfigured
>> accordingly.
>>
>> Items that would need to be updated:
>> - node name in the node config
>> - node serving certificate
>>
>> There is an Ansible playbook that can automate the redeployment of
>> certificates as well (playbooks/byo/openshift-clust
>> er/redeploy-certificates.yml).
>>
>> --
>> Jason DeTiberus
>>
>
>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Missing OpenShift Nodes - Unable to Join Cluster

Reply via email to