Re: How to recover from failed update in OpenShift 4.2.x?

Joel Pearson Tue, 26 Nov 2019 19:24:38 -0800

On Thu, 21 Nov 2019 at 10:58, Clayton Coleman <ccole...@redhat.com> wrote:


>
>
> On Nov 17, 2019, at 9:34 PM, Joel Pearson <japear...@agiledigital.com.au>
> wrote:
>
> So, I'm running OpenShift 4.2 on Azure UPI following this blog article:
> https://blog.openshift.com/openshift-4-1-upi-environment-deployment-on-microsoft-azure-cloud/
>  with
> a few customisations on the terraform side.
>
> One of the main differences it seems, is how the router/ingress is
> handled. Normal Azure uses load balancers, but UPI Azure uses a regular
> router (that I'm used to seeing the 3.x version) which is configured by
> setting the "HostNetwork" for the endpoint publishing strategy
> <https://github.com/JuozasA/ocp4-azure-upi/blob/master/ingresscontroller-default.yaml#L9-L10>
>
>
> This sounds like a bug in Azure UPI.  IPI is the reference architecture,
> it shouldn’t have a default divergent from the ref arch.
>

In the blog, he mentions that he has changed the architecture because it
creates a public facing load balancer.  In my case I'm not allowed to
create a public load balancer at all, additionally I can't use Azure's
Public or Private DNS either, so I had to customise the terraform templates
even more.

Maybe supported UPI Azure will allow internally facing load balancers?


>
>
> It was all working fine in OpenShift 4.2.0 and 4.2.2, but when I upgraded
> to OpenShift 4.2.4, the router stopped listening on ports 80 and 443, I
> could see the pod running with "crictl ps", but a "netstat -tpln" didn't
> show anything listening.
>
> I tried updating the version back from 4.2.4 to 4.2.2, but I
> accidentally used 4.1.22 image digest value, so I quickly reverted back to
> 4.2.4 once I saw the apiservers coming up as 4.1.22.  I then noticed that
> there was a 4.2.7 release on the candidate-4.2 channel, so I switched to
> that, and ingress started working properly again.
>
> So my question is, what is the strategy for recovering from a failed
> update? Do I need to have etcd backups and then restore the cluster by
> restoring etcd? Ie.
> https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html
>
> The upgrade page
> <https://docs.openshift.com/container-platform/4.2/updating/updating-cluster-between-minor.html>
> specifically says "Reverting your cluster to a previous version, or a
> rollback, is not supported. Only upgrading to a newer version is
> supported." so is it an expectation for a production cluster that you would
> restore from backup if the cluster isn't usable?
>
>
> Backup, yes.  If you could open a bug for the documentation that would be
> great.
>

Thanks, raised it here: https://bugzilla.redhat.com/show_bug.cgi?id=1777155


>
>
> Maybe the upgrade page should mention taking backups? Especially if there
> is no rollback option.
>
> _______________________________________________
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>

_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: How to recover from failed update in OpenShift 4.2.x?

Reply via email to