subject:"How to recover from failed update in OpenShift 4.2.x\?"

Re: How to recover from failed update in OpenShift 4.2.x?

2019-11-26 Thread Joel Pearson

On Thu, 21 Nov 2019 at 10:58, Clayton Coleman  wrote:

>
>
> On Nov 17, 2019, at 9:34 PM, Joel Pearson 
> wrote:
>
> So, I'm running OpenShift 4.2 on Azure UPI following this blog article:
> https://blog.openshift.com/openshift-4-1-upi-environment-deployment-on-microsoft-azure-cloud/
>  with
> a few customisations on the terraform side.
>
> One of the main differences it seems, is how the router/ingress is
> handled. Normal Azure uses load balancers, but UPI Azure uses a regular
> router (that I'm used to seeing the 3.x version) which is configured by
> setting the "HostNetwork" for the endpoint publishing strategy
> 
>
>
> This sounds like a bug in Azure UPI.  IPI is the reference architecture,
> it shouldn’t have a default divergent from the ref arch.
>

In the blog, he mentions that he has changed the architecture because it
creates a public facing load balancer.  In my case I'm not allowed to
create a public load balancer at all, additionally I can't use Azure's
Public or Private DNS either, so I had to customise the terraform templates
even more.

Maybe supported UPI Azure will allow internally facing load balancers?


>
>
> It was all working fine in OpenShift 4.2.0 and 4.2.2, but when I upgraded
> to OpenShift 4.2.4, the router stopped listening on ports 80 and 443, I
> could see the pod running with "crictl ps", but a "netstat -tpln" didn't
> show anything listening.
>
> I tried updating the version back from 4.2.4 to 4.2.2, but I
> accidentally used 4.1.22 image digest value, so I quickly reverted back to
> 4.2.4 once I saw the apiservers coming up as 4.1.22.  I then noticed that
> there was a 4.2.7 release on the candidate-4.2 channel, so I switched to
> that, and ingress started working properly again.
>
> So my question is, what is the strategy for recovering from a failed
> update? Do I need to have etcd backups and then restore the cluster by
> restoring etcd? Ie.
> https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html
>
> The upgrade page
> 
> specifically says "Reverting your cluster to a previous version, or a
> rollback, is not supported. Only upgrading to a newer version is
> supported." so is it an expectation for a production cluster that you would
> restore from backup if the cluster isn't usable?
>
>
> Backup, yes.  If you could open a bug for the documentation that would be
> great.
>

Thanks, raised it here: https://bugzilla.redhat.com/show_bug.cgi?id=1777155


>
>
> Maybe the upgrade page should mention taking backups? Especially if there
> is no rollback option.
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: How to recover from failed update in OpenShift 4.2.x?

2019-11-20 Thread Clayton Coleman

On Nov 17, 2019, at 9:34 PM, Joel Pearson
wrote:

So, I'm running OpenShift 4.2 on Azure UPI following this blog article:
https://blog.openshift.com/openshift-4-1-upi-environment-deployment-on-microsoft-azure-cloud/
with
a few customisations on the terraform side.

One of the main differences it seems, is how the router/ingress is handled.
Normal Azure uses load balancers, but UPI Azure uses a regular router (that
I'm used to seeing the 3.x version) which is configured by setting the
"HostNetwork"
for the endpoint publishing strategy

This sounds like a bug in Azure UPI. IPI is the reference architecture, it
shouldn’t have a default divergent from the ref arch.

It was all working fine in OpenShift 4.2.0 and 4.2.2, but when I upgraded
to OpenShift 4.2.4, the router stopped listening on ports 80 and 443, I
could see the pod running with "crictl ps", but a "netstat -tpln" didn't
show anything listening.

I tried updating the version back from 4.2.4 to 4.2.2, but I
accidentally used 4.1.22 image digest value, so I quickly reverted back to
4.2.4 once I saw the apiservers coming up as 4.1.22. I then noticed that
there was a 4.2.7 release on the candidate-4.2 channel, so I switched to
that, and ingress started working properly again.

So my question is, what is the strategy for recovering from a failed
update? Do I need to have etcd backups and then restore the cluster by
restoring etcd? Ie.
https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.html

The upgrade page

specifically says "Reverting your cluster to a previous version, or a
rollback, is not supported. Only upgrading to a newer version is
supported." so is it an expectation for a production cluster that you would
restore from backup if the cluster isn't usable?

Backup, yes. If you could open a bug for the documentation that would be
great.

Maybe the upgrade page should mention taking backups? Especially if there
is no rollback option.

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

How to recover from failed update in OpenShift 4.2.x?

2019-11-17 Thread Joel Pearson

The upgrade page

Maybe the upgrade page should mention taking backups? Especially if there
is no rollback option.
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: How to recover from failed update in OpenShift 4.2.x?

Re: How to recover from failed update in OpenShift 4.2.x?

How to recover from failed update in OpenShift 4.2.x?

3 matches

Site Navigation

Mail list logo

Footer information