Hi Tim, Can you verify that the host's clocks are being synced correctly as per Simon's other suggestion?
Ben On Tue, 31 Mar 2020 at 16:05, Tim Dudgeon <tdudgeon...@gmail.com> wrote: > Hi Simon, > > we're run those playbooks and all certs are reported as still being valid. > > Tim > > On 31/03/2020 15:59, Simon Krenger wrote: > > Hi Tim, > > > > Note that there are multiple sets of certificates, both external and > > internal. So it would be worth checking the certificates again using > > the Certificate Expiration Playbooks (see link below). The > > documentation also has an overview of what can be done to renew > > certain certificates: > > > > - [ Redeploying Certificates ] > > https://docs.okd.io/3.11/install_config/redeploying_certificates.html > > > > Apart from checking all certificates, I'd certainly review the time > > synchronisation for the whole cluster, as we see the message "x509: > > certificate has expired or is not yet valid". > > > > I hope this helps. > > > > Kind regards > > Simon > > > > On Tue, Mar 31, 2020 at 4:33 PM Tim Dudgeon <tdudgeon...@gmail.com> > wrote: > >> One of our OKD 3.11 clusters has suddenly stopped working without any > >> obvious reason. > >> > >> The origin-node service on the nodes does not start (times out). > >> The master-api pod is running on the master. > >> The nodes can access the master-api endpoints. > >> > >> The logs of the master-api pod look mostly OK other than a huge number > >> of warnings about certificates that don't really make sense as the > >> certificates are valid (we use named certificates from let's Encryt and > >> they were renewed about 2 weeks ago and all appear to be correct. > >> > >> Examples of errors from the master-api pod are: > >> > >> I0331 12:46:57.065147 1 establishing_controller.go:73] Starting > >> EstablishingController > >> I0331 12:46:57.065561 1 logs.go:49] http: TLS handshake error from > >> 192.168.160.17:58024: EOF > >> I0331 12:46:57.071932 1 logs.go:49] http: TLS handshake error from > >> 192.168.160.19:48102: EOF > >> I0331 12:46:57.072036 1 logs.go:49] http: TLS handshake error from > >> 192.168.160.19:37178: EOF > >> I0331 12:46:57.072141 1 logs.go:49] http: TLS handshake error from > >> 192.168.160.17:58022: EOF > >> > >> E0331 12:47:37.855023 1 memcache.go:147] couldn't get resource > >> list for metrics.k8s.io/v1beta1: the server is currently unable to > >> handle the request > >> E0331 12:47:37.856569 1 memcache.go:147] couldn't get resource > >> list for servicecatalog.k8s.io/v1beta1: the server is currently unable > >> to handle the request > >> E0331 12:47:44.115290 1 authentication.go:62] Unable to > >> authenticate the request due to an error: [x509: certificate has expired > >> or is not yet valid, x509: certificate > >> has expired or is not yet valid] > >> E0331 12:47:44.118976 1 authentication.go:62] Unable to > >> authenticate the request due to an error: [x509: certificate has expired > >> or is not yet valid, x509: certificate > >> has expired or is not yet valid] > >> E0331 12:47:44.122276 1 authentication.go:62] Unable to > >> authenticate the request due to an error: [x509: certificate has expired > >> or is not yet valid, x509: certificate > >> has expired or is not yet valid] > >> > >> Huge number of this second sort. > >> > >> Any ideas what is wrong? > >> > >> > >> > >> _______________________________________________ > >> users mailing list > >> users@lists.openshift.redhat.com > >> http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > > > > > _______________________________________________ > users mailing list > users@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > -- BENJAMIN HOLMES SENIOR Solution ARCHITECT Red Hat UKI Presales <https://www.redhat.com/> bhol...@redhat.com M: 07876-885388 <http://redhatemailsignature-marketing.itos.redhat.com/> <https://red.ht/sig>
_______________________________________________ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users