Re: Kubernetes Clusters Failing to Start 4.19.0

Wally B Thu, 15 Feb 2024 08:28:30 -0800

Wei,

It did work before so a routing change at our core must have messed it up.
I assume the routing issue was the actual issue here. Everything else was
just ancillary.


Thanks for all the help, the clusters are working now!
-Wally

On Thu, Feb 15, 2024 at 10:07 AM Wei ZHOU <[email protected]> wrote:

> Hi,
>
> As I understand,
> 1. After upgrading, you need to patch the system vms or recreate them.
> Not a bug I think.
> 2. a minor issue which does not impact the provisioning and operation of
> CKS cluster.
> 3. Looks like a network misconfiguration, but  did it work before ?
>
>
> -Wei
>
>
> On Thu, 15 Feb 2024 at 16:39, Wally B <[email protected]> wrote:
>
> > As a quick add-on. After running those commands and getting the kubectl
> > commands working the Error in the management log is
> >
> > tail -f /var/log/cloudstack/management/management-server.log | grep ERROR
> >
> > 2024-02-15 14:09:41,124 ERROR [c.c.k.c.a.KubernetesClusterActionWorker]
> > (API-Job-Executor-4:ctx-29ed2b8e job-12348 ctx-3355553d) (logid:ae448a2e)
> > Failed to setup Kubernetes cluster : pz-dev-k8s-ncus-00001 in usable
> state
> > as unable to access control node VMs of the cluster
> >
> > 2024-02-15 14:09:41,129 ERROR [c.c.a.ApiAsyncJobDispatcher]
> > (API-Job-Executor-4:ctx-29ed2b8e job-12348) (logid:ae448a2e) Unexpected
> > exception while executing
> >
> >
> org.apache.cloudstack.api.command.user.kubernetes.cluster.CreateKubernetesClusterCmd
> >
> > 2024-02-15 14:33:01,117 ERROR [c.c.k.c.a.KubernetesClusterActionWorker]
> > (API-Job-Executor-17:ctx-0685d548 job-12552 ctx-997de847)
> (logid:fda8fc82)
> > Failed to setup Kubernetes cluster : pz-dev-k8s-ncus-00001 in usable
> state
> > as unable to access control node VMs of the cluster
> >
> >
> > did a quick test-netconnection from my pc to the control node and got
> >
> >
> >
> > Test-NetConnection 99.xx.xx.xxx -p 6443
> >
> >
> >
> >                                      ComputerName     :   99.xx.xx.xxx
> > RemoteAddress    :   99.xx.xx.xxx
> > RemotePort       : 6443
> > InterfaceAlias   : Ethernet
> > SourceAddress    : xxx.xxx.xxx.xxx
> > TcpTestSucceeded : True
> >
> >
> > So I did a test to see If I could get it from my Management hosts (on the
> > same public ip range as the Virtual Router Public IP). and I got a TTL
> > Expired.
> >
> >
> >
> >
> > To wrap it up there were 3 issues.
> >
> >
> > 1. Needed to delete and re-provision the Secondary Storage System Virtual
> > Machine after upgrading from 4.18.1 to 4.19.0
> > 2. Needed to fix additional control nodes not getting the kubeadm.conf
> > copied correctly (Wei PR)
> > 3. Needed to fix some routing on our end since were were bouncing between
> > our L3 TOR ->Firewall <- ISP Routers
> >
> > Thanks again for all the help, everyone!
> > Wally
> >
> > On Thu, Feb 15, 2024 at 7:24 AM Wally B <[email protected]> wrote:
> >
> > > Thanks Wei ZHOU!
> > >
> > > That fixed the kubectl command issue but the cluster still just sits at
> > >
> > > Create Kubernetes cluster k8s-cluster-1 in progress
> > >
> > > Maybe this is just a UI issue? Unfortunately If I stop the k8s cluster
> > > after it errors out it just stays in the error state.
> > >
> > > 1. Click Stop Kubernetes cluster
> > > 2. UI Says it successfully stopped.
> > > 3. Try to Start the Cluster but the power button just says  Stop
> > > Kubernetes cluster and the UI Status stays in the error state.
> > >
> > >
> > > On Thu, Feb 15, 2024 at 7:02 AM Wei ZHOU <[email protected]>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> Please run the following commands as root:
> > >>
> > >> mkdir -p /root/.kube
> > >> cp -i /etc/kubernetes/admin.conf /root/.kube/config
> > >>
> > >> After then the kubectl commands should work
> > >>
> > >> -Wei
> > >>
> > >> On Thu, 15 Feb 2024 at 13:53, Wally B <[email protected]> wrote:
> > >>
> > >> > What command do you suggest I run?
> > >> >
> > >> > kubeconfig returns command not found
> > >> >
> > >> > on your PR I see
> > >> >
> > >> > kubeadm join is being called out as well but I wanted to verify what
> > you
> > >> > wanted me to test first.
> > >> >
> > >> > On Thu, Feb 15, 2024 at 2:41 AM Wei ZHOU <[email protected]>
> > wrote:
> > >> >
> > >> > > Hi Wally,
> > >> > >
> > >> > > I think the cluster is working fine.
> > >> > > The kubeconfig is missing in extra nodes. I have just created a PR
> > for
> > >> > it:
> > >> > > https://github.com/apache/cloudstack/pull/8658
> > >> > > You can run the command on the control nodes which should fix the
> > >> > problem.
> > >> > >
> > >> > >
> > >> > > -Wei
> > >> > >
> > >> > > On Thu, 15 Feb 2024 at 09:31, Wally B <[email protected]>
> > wrote:
> > >> > >
> > >> > > > 3 Nodes
> > >> > > >
> > >> > > > Control 1 -- No Errors
> > >> > > >
> > >> > > > kubectl get nodes
> > >> > > > NAME                                        STATUS   ROLES
> > >> >  AGE
> > >> > > >  VERSION
> > >> > > > pz-dev-k8s-ncus-00001-control-18dabdb141b   Ready
> control-plane
> > >> >  2m6s
> > >> > > > v1.28.4
> > >> > > > pz-dev-k8s-ncus-00001-control-18dabdb6ad6   Ready
> control-plane
> > >> >  107s
> > >> > > > v1.28.4
> > >> > > > pz-dev-k8s-ncus-00001-control-18dabdbc0a8   Ready
> control-plane
> > >> >  108s
> > >> > > > v1.28.4
> > >> > > > pz-dev-k8s-ncus-00001-node-18dabdc1644      Ready    <none>
> > >> > 115s
> > >> > > > v1.28.4
> > >> > > > pz-dev-k8s-ncus-00001-node-18dabdc6c16      Ready    <none>
> > >> > 115s
> > >> > > > v1.28.4
> > >> > > >
> > >> > > >
> > >> > > > kubectl get pods --all-namespaces
> > >> > > > NAMESPACE              NAME
> > >> > > >                READY   STATUS    RESTARTS        AGE
> > >> > > > kube-system            coredns-5dd5756b68-g84vk
> > >> > > >                1/1     Running   0               2m46s
> > >> > > > kube-system            coredns-5dd5756b68-kf92x
> > >> > > >                1/1     Running   0               2m46s
> > >> > > > kube-system
> > >> etcd-pz-dev-k8s-ncus-00001-control-18dabdb141b
> > >> > > >                1/1     Running   0               2m50s
> > >> > > > kube-system
> > >> etcd-pz-dev-k8s-ncus-00001-control-18dabdb6ad6
> > >> > > >                1/1     Running   0               2m16s
> > >> > > > kube-system
> > >> etcd-pz-dev-k8s-ncus-00001-control-18dabdbc0a8
> > >> > > >                1/1     Running   0               2m37s
> > >> > > > kube-system
> > >> > > >  kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabdb141b
> > >> > 1/1
> > >> > > >   Running   0               2m52s
> > >> > > > kube-system
> > >> > > >  kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabdb6ad6
> > >> > 1/1
> > >> > > >   Running   1 (2m16s ago)   2m15s
> > >> > > > kube-system
> > >> > > >  kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabdbc0a8
> > >> > 1/1
> > >> > > >   Running   0               2m37s
> > >> > > > kube-system
> > >> > > >
> kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabdb141b
> > >> >  1/1
> > >> > > >   Running   1 (2m25s ago)   2m51s
> > >> > > > kube-system
> > >> > > >
> kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabdb6ad6
> > >> >  1/1
> > >> > > >   Running   0               2m18s
> > >> > > > kube-system
> > >> > > >
> kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabdbc0a8
> > >> >  1/1
> > >> > > >   Running   0               2m37s
> > >> > > > kube-system            kube-proxy-445qx
> > >> > > >                1/1     Running   0               2m37s
> > >> > > > kube-system            kube-proxy-8swdg
> > >> > > >                1/1     Running   0               2m2s
> > >> > > > kube-system            kube-proxy-bl9rx
> > >> > > >                1/1     Running   0               2m47s
> > >> > > > kube-system            kube-proxy-pv8gj
> > >> > > >                1/1     Running   0               2m43s
> > >> > > > kube-system            kube-proxy-v7cw2
> > >> > > >                1/1     Running   0               2m43s
> > >> > > > kube-system
> > >> > > >  kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabdb141b
> > >> > 1/1
> > >> > > >   Running   1 (2m22s ago)   2m50s
> > >> > > > kube-system
> > >> > > >  kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabdb6ad6
> > >> > 1/1
> > >> > > >   Running   0               2m15s
> > >> > > > kube-system
> > >> > > >  kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabdbc0a8
> > >> > 1/1
> > >> > > >   Running   0               2m37s
> > >> > > > kube-system            weave-net-8dvl5
> > >> > > >                 2/2     Running   0               2m37s
> > >> > > > kube-system            weave-net-c54bz
> > >> > > >                 2/2     Running   0               2m43s
> > >> > > > kube-system            weave-net-lv8l4
> > >> > > >                 2/2     Running   1 (2m42s ago)   2m47s
> > >> > > > kube-system            weave-net-vg6td
> > >> > > >                 2/2     Running   0               2m2s
> > >> > > > kube-system            weave-net-vq9s4
> > >> > > >                 2/2     Running   0               2m43s
> > >> > > > kubernetes-dashboard
>  dashboard-metrics-scraper-5657497c4c-4k886
> > >> > > >                1/1     Running   0               2m46s
> > >> > > > kubernetes-dashboard   kubernetes-dashboard-5b749d9495-jpbxl
> > >> > > >                 1/1     Running   1 (2m22s ago)   2m46s
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > Control 2: Errors at the CLI
> > >> > > > Failed to start Execute cloud user/final scripts.
> > >> > > >
> > >> > > > kubectl get nodes
> > >> > > > E0215 08:27:07.797825    2772 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:27:07.798759    2772 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:27:07.801039    2772 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:27:07.801977    2772 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:27:07.804029    2772 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > The connection to the server localhost:8080 was refused - did
> you
> > >> > specify
> > >> > > > the right host or port?
> > >> > > >
> > >> > > > kubectl get pods --all-namespaces
> > >> > > > E0215 08:29:41.818452    2811 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:29:41.819935    2811 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:29:41.820883    2811 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:29:41.822680    2811 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:29:41.823571    2811 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > The connection to the server localhost:8080 was refused - did
> you
> > >> > specify
> > >> > > > the right host or port?
> > >> > > >
> > >> > > > Ping Google: Success
> > >> > > > Ping Control Node 1: Success
> > >> > > >
> > >> > > >
> > >> > > > Control 3: Errors at the CLI
> > >> > > > Failed to start Execute cloud user/final scripts.
> > >> > > > Failed to start deploy-kube-system.service.
> > >> > > >
> > >> > > > kubectl get nodes
> > >> > > > E0215 08:27:15.057313    2697 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:27:15.058538    2697 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:27:15.059260    2697 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:27:15.061599    2697 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:27:15.062029    2697 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > The connection to the server localhost:8080 was refused - did
> you
> > >> > specify
> > >> > > > the right host or port?
> > >> > > >
> > >> > > >
> > >> > > > kubectl get pods --all-namespaces
> > >> > > > E0215 08:29:57.108716    2736 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:29:57.109533    2736 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:29:57.111372    2736 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:29:57.112074    2736 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > E0215 08:29:57.113956    2736 memcache.go:265] couldn't get
> > current
> > >> > > server
> > >> > > > API group list: Get "http://localhost:8080/api?timeout=32s":
> dial
> > >> tcp
> > >> > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > The connection to the server localhost:8080 was refused - did
> you
> > >> > specify
> > >> > > > the right host or port?
> > >> > > >
> > >> > > >
> > >> > > > Ping Google: Success
> > >> > > > Ping Control Node 1: Success
> > >> > > >
> > >> > > >
> > >> > > > On Thu, Feb 15, 2024 at 2:17 AM Wei ZHOU <[email protected]
> >
> > >> > wrote:
> > >> > > >
> > >> > > > > Can you try with 3 control nodes ?
> > >> > > > >
> > >> > > > > -Wei
> > >> > > > >
> > >> > > > > On Thu, 15 Feb 2024 at 09:13, Wally B <[email protected]>
> > >> wrote:
> > >> > > > >
> > >> > > > > > - zone type :
> > >> > > > > >         Core
> > >> > > > > > - network type:
> > >> > > > > >         Advanced
> > >> > > > > >         Isolated Network inside a Redundant VPC (same
> results
> > in
> > >> > just
> > >> > > > an
> > >> > > > > > Isolated network without VPC)
> > >> > > > > > - number of control nodes:
> > >> > > > > >         2 Control Nodes (HA Cluster)
> > >> > > > > >
> > >> > > > > > We were able to deploy k8s in the past, not sure what
> changed.
> > >> > > > > >
> > >> > > > > > Thanks!
> > >> > > > > > -Wally
> > >> > > > > >
> > >> > > > > > On Thu, Feb 15, 2024 at 2:04 AM Wei ZHOU <
> > [email protected]
> > >> >
> > >> > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi,
> > >> > > > > > >
> > >> > > > > > > can you share
> > >> > > > > > > - zone type
> > >> > > > > > > - network type
> > >> > > > > > > - number of control nodes
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > -Wei
> > >> > > > > > >
> > >> > > > > > > On Thu, 15 Feb 2024 at 08:52, Wally B <
> > [email protected]>
> > >> > > wrote:
> > >> > > > > > >
> > >> > > > > > > > So
> > >> > > > > > > >
> > >> > > > > > > > Recreating the Sec Storage VM Fixed the Cert issue and I
> > was
> > >> > able
> > >> > > > to
> > >> > > > > > > > install K8s 1.28.4 Binaries. --- THANKS Wei ZHOU !
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > Im still getting
> > >> > > > > > > >
> > >> > > > > > > > [FAILED] Failed to start Execute cloud user/final
> scripts.
> > >> > > > > > > >
> > >> > > > > > > > on 1 control and 1 worker.
> > >> > > > > > > >
> > >> > > > > > > > *Control 1 --  pz-dev-k8s-ncus-00001-control-18dabaf66c1
> > --
> > >> > > :*
> > >> > > > No
> > >> > > > > > > > errors at the CLI
> > >> > > > > > > >
> > >> > > > > > > > kubectl get nodes
> > >> > > > > > > > NAME                                        STATUS
>  ROLES
> > >> > > > > >  AGE
> > >> > > > > > > >   VERSION
> > >> > > > > > > > pz-dev-k8s-ncus-00001-control-18dabaf0edb   Ready
> > >> > > control-plane
> > >> > > > > >  5m2s
> > >> > > > > > > >  v1.28.4
> > >> > > > > > > > pz-dev-k8s-ncus-00001-control-18dabaf66c1   Ready
> > >> > > control-plane
> > >> > > > > > >  4m44s
> > >> > > > > > > >   v1.28.4
> > >> > > > > > > > pz-dev-k8s-ncus-00001-node-18dabafb0bd      Ready
> > <none>
> > >> > > > > > > 4m47s
> > >> > > > > > > >   v1.28.4
> > >> > > > > > > > pz-dev-k8s-ncus-00001-node-18dabb006bc      Ready
> > <none>
> > >> > > > > > > 4m47s
> > >> > > > > > > >   v1.28.4
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > kubectl get pods --all-namespaces
> > >> > > > > > > > NAMESPACE              NAME
> > >> > > > > > > >                READY   STATUS    RESTARTS        AGE
> > >> > > > > > > > kube-system            coredns-5dd5756b68-295gb
> > >> > > > > > > >                1/1     Running   0               5m32s
> > >> > > > > > > > kube-system            coredns-5dd5756b68-cdwvw
> > >> > > > > > > >                1/1     Running   0               5m33s
> > >> > > > > > > > kube-system
> > >> > > > etcd-pz-dev-k8s-ncus-00001-control-18dabaf0edb
> > >> > > > > > > >                1/1     Running   0               5m36s
> > >> > > > > > > > kube-system
> > >> > > > etcd-pz-dev-k8s-ncus-00001-control-18dabaf66c1
> > >> > > > > > > >                1/1     Running   0               5m23s
> > >> > > > > > > > kube-system
> > >> > > > > > > >
> kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabaf0edb
> > >> > > > > > 1/1
> > >> > > > > > > >   Running   0               5m36s
> > >> > > > > > > > kube-system
> > >> > > > > > > >
> kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabaf66c1
> > >> > > > > > 1/1
> > >> > > > > > > >   Running   0               5m23s
> > >> > > > > > > > kube-system
> > >> > > > > > > >
> > >> > > kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabaf0edb
> > >> > > > > >  1/1
> > >> > > > > > > >   Running   1 (5m13s ago)   5m36s
> > >> > > > > > > > kube-system
> > >> > > > > > > >
> > >> > > kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabaf66c1
> > >> > > > > >  1/1
> > >> > > > > > > >   Running   0               5m23s
> > >> > > > > > > > kube-system            kube-proxy-2m8zb
> > >> > > > > > > >                1/1     Running   0               5m26s
> > >> > > > > > > > kube-system            kube-proxy-cwpjg
> > >> > > > > > > >                1/1     Running   0               5m33s
> > >> > > > > > > > kube-system            kube-proxy-l2vbf
> > >> > > > > > > >                1/1     Running   0               5m26s
> > >> > > > > > > > kube-system            kube-proxy-qhlqt
> > >> > > > > > > >                1/1     Running   0               5m23s
> > >> > > > > > > > kube-system
> > >> > > > > > > >
> kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabaf0edb
> > >> > > > > > 1/1
> > >> > > > > > > >   Running   1 (5m8s ago)    5m36s
> > >> > > > > > > > kube-system
> > >> > > > > > > >
> kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabaf66c1
> > >> > > > > > 1/1
> > >> > > > > > > >   Running   0               5m23s
> > >> > > > > > > > kube-system            weave-net-5cs26
> > >> > > > > > > >                 2/2     Running   1 (5m9s ago)    5m26s
> > >> > > > > > > > kube-system            weave-net-9zqrw
> > >> > > > > > > >                 2/2     Running   1 (5m28s ago)   5m33s
> > >> > > > > > > > kube-system            weave-net-fcwtr
> > >> > > > > > > >                 2/2     Running   0               5m23s
> > >> > > > > > > > kube-system            weave-net-lh2dh
> > >> > > > > > > >                 2/2     Running   1 (4m41s ago)   5m26s
> > >> > > > > > > > kubernetes-dashboard
> > >> >  dashboard-metrics-scraper-5657497c4c-r284t
> > >> > > > > > > >                1/1     Running   0               5m32s
> > >> > > > > > > > kubernetes-dashboard
> >  kubernetes-dashboard-5b749d9495-vtwdd
> > >> > > > > > > >                 1/1     Running   0               5m32s
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > *Control 2 ---
> pz-dev-k8s-ncus-00001-control-18dabaf66c1
> > >>  :*
> > >> > > > > > [FAILED]
> > >> > > > > > > > Failed to start Execute cloud user/final scripts.
> > >> > > > > > > >
> > >> > > > > > > > kubectl get nodes
> > >> > > > > > > > E0215 07:38:33.314561    2643 memcache.go:265] couldn't
> > get
> > >> > > current
> > >> > > > > > > server
> > >> > > > > > > > API group list: Get "
> > http://localhost:8080/api?timeout=32s
> > >> ":
> > >> > > dial
> > >> > > > > tcp
> > >> > > > > > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > > > > > E0215 07:38:33.316751    2643 memcache.go:265] couldn't
> > get
> > >> > > current
> > >> > > > > > > server
> > >> > > > > > > > API group list: Get "
> > http://localhost:8080/api?timeout=32s
> > >> ":
> > >> > > dial
> > >> > > > > tcp
> > >> > > > > > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > > > > > E0215 07:38:33.317754    2643 memcache.go:265] couldn't
> > get
> > >> > > current
> > >> > > > > > > server
> > >> > > > > > > > API group list: Get "
> > http://localhost:8080/api?timeout=32s
> > >> ":
> > >> > > dial
> > >> > > > > tcp
> > >> > > > > > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > > > > > E0215 07:38:33.319181    2643 memcache.go:265] couldn't
> > get
> > >> > > current
> > >> > > > > > > server
> > >> > > > > > > > API group list: Get "
> > http://localhost:8080/api?timeout=32s
> > >> ":
> > >> > > dial
> > >> > > > > tcp
> > >> > > > > > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > > > > > E0215 07:38:33.319975    2643 memcache.go:265] couldn't
> > get
> > >> > > current
> > >> > > > > > > server
> > >> > > > > > > > API group list: Get "
> > http://localhost:8080/api?timeout=32s
> > >> ":
> > >> > > dial
> > >> > > > > tcp
> > >> > > > > > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > > > > > The connection to the server localhost:8080 was refused
> -
> > >> did
> > >> > you
> > >> > > > > > specify
> > >> > > > > > > > the right host or port?
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > kubectl get pods --all-namespaces
> > >> > > > > > > > E0215 07:42:23.786704    2700 memcache.go:265] couldn't
> > get
> > >> > > current
> > >> > > > > > > server
> > >> > > > > > > > API group list: Get "
> > http://localhost:8080/api?timeout=32s
> > >> ":
> > >> > > dial
> > >> > > > > tcp
> > >> > > > > > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > > > > > E0215 07:42:23.787455    2700 memcache.go:265] couldn't
> > get
> > >> > > current
> > >> > > > > > > server
> > >> > > > > > > > API group list: Get "
> > http://localhost:8080/api?timeout=32s
> > >> ":
> > >> > > dial
> > >> > > > > tcp
> > >> > > > > > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > > > > > E0215 07:42:23.789529    2700 memcache.go:265] couldn't
> > get
> > >> > > current
> > >> > > > > > > server
> > >> > > > > > > > API group list: Get "
> > http://localhost:8080/api?timeout=32s
> > >> ":
> > >> > > dial
> > >> > > > > tcp
> > >> > > > > > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > > > > > E0215 07:42:23.790051    2700 memcache.go:265] couldn't
> > get
> > >> > > current
> > >> > > > > > > server
> > >> > > > > > > > API group list: Get "
> > http://localhost:8080/api?timeout=32s
> > >> ":
> > >> > > dial
> > >> > > > > tcp
> > >> > > > > > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > > > > > E0215 07:42:23.791742    2700 memcache.go:265] couldn't
> > get
> > >> > > current
> > >> > > > > > > server
> > >> > > > > > > > API group list: Get "
> > http://localhost:8080/api?timeout=32s
> > >> ":
> > >> > > dial
> > >> > > > > tcp
> > >> > > > > > > > 127.0.0.1:8080: connect: connection refused
> > >> > > > > > > > The connection to the server localhost:8080 was refused
> -
> > >> did
> > >> > you
> > >> > > > > > specify
> > >> > > > > > > > the right host or port?
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > */var/log/daemon.log*
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1KuIx0jI4TuAXPgACY3rJQz3L2B8AjeqOL0Fm5r4YF5M/edit?usp=sharing
> > >> > > > > > > >
> > >> > > > > > > > */var/log/messages*
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/15xet6kxI9rdgi4RkIHqtn-Wywph4h1Coyt_cyrJYkv4/edit?usp=sharing
> > >> > > > > > > >
> > >> > > > > > > > On Thu, Feb 15, 2024 at 1:21 AM Wei ZHOU <
> > >> > [email protected]>
> > >> > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Destroy ssvm and retry when new ssvm is Up  ?
> > >> > > > > > > > >
> > >> > > > > > > > > -Wei
> > >> > > > > > > > >
> > >> > > > > > > > > 在 2024年2月15日星期四，Wally B <[email protected]> 写道：
> > >> > > > > > > > >
> > >> > > > > > > > > > Super Weird. I have two other versions added
> > >> successfully
> > >> > but
> > >> > > > now
> > >> > > > > > > when
> > >> > > > > > > > I
> > >> > > > > > > > > > try to add an ISO/version I get the following on the
> > >> > > management
> > >> > > > > > host.
> > >> > > > > > > > > This
> > >> > > > > > > > > > is the first time I've tried adding a K8s version
> > since
> > >> > > 4.18.0
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > tail -f
> > >> > /var/log/cloudstack/management/management-server.log
> > >> > > |
> > >> > > > > grep
> > >> > > > > > > > ERROR
> > >> > > > > > > > > >
> > >> > > > > > > > > > 2024-02-15 06:26:18,900 DEBUG [c.c.a.t.Request]
> > >> > > > > > > > > > (AgentManager-Handler-5:null) (logid:) Seq
> > >> > > > > 48-6373437897659383816:
> > >> > > > > > > > > > Processing:  { Ans: , MgmtId: 15643723020152, via:
> 48,
> > >> Ver:
> > >> > > v1,
> > >> > > > > > > Flags:
> > >> > > > > > > > > 10,
> > >> > > > > > > > > > [{"com.cloud.agent.api.storage.DownloadAnswer":{"
> > >> > > > > > > > > > jobId":"39d72d08-ab48-47dd-b09a-eee3ed816f4d","
> > >> > > > > > > > > > downloadPct":"0","errorString":"PKIX
> > >> > > > > > > > > > path building failed:
> > >> > > > > > > > > >
> > >> sun.security.provider.certpath.SunCertPathBuilderException:
> > >> > > > > unable
> > >> > > > > > to
> > >> > > > > > > > > find
> > >> > > > > > > > > > valid certification path to requested
> > >> > > > > > > > > >
> > target","downloadStatus":"DOWNLOAD_ERROR","downloadPath"
> > >> > > > > > > > > >
> > :"/mnt/SecStorage/73075a0a-38a1-3631-8170-8887c04f6073/
> > >> > > > > > > > > > template/tmpl/1/223/dnld9180711723601784047tmp_","
> > >> > > > > > > > > >
> installPath":"template/tmpl/1/223","templateSize":"(0
> > >> > > > > > > > > > bytes) 0","templatePhySicalSize":"(0 bytes)
> > >> > > > > > > > > > 0","checkSum":"4dfb9d8be2191bc8bc4b89d78795a5
> > >> > > > > > > > > > b","result":"true","details":"PKIX
> > >> > > > > > > > > > path building failed:
> > >> > > > > > > > > >
> > >> sun.security.provider.certpath.SunCertPathBuilderException:
> > >> > > > > unable
> > >> > > > > > to
> > >> > > > > > > > > find
> > >> > > > > > > > > > valid certification path to requested
> > >> > > > > > > > > >
> target","wait":"0","bypassHostMaintenance":"false"}}]
> > }
> > >> > > > > > > > > >
> > >> > > > > > > > > > 2024-02-15 06:26:18,937 ERROR
> > >> > > > > [o.a.c.s.i.BaseImageStoreDriverImpl]
> > >> > > > > > > > > > (RemoteHostEndPoint-5:ctx-55063062) (logid:e21177cb)
> > >> Failed
> > >> > > to
> > >> > > > > > > register
> > >> > > > > > > > > > template: b6e79c5a-38d4-4cf5-8606-e6f209b6b4c2 with
> > >> error:
> > >> > > PKIX
> > >> > > > > > path
> > >> > > > > > > > > > building failed:
> > >> > > > > > > > > >
> > >> sun.security.provider.certpath.SunCertPathBuilderException:
> > >> > > > > unable
> > >> > > > > > to
> > >> > > > > > > > > find
> > >> > > > > > > > > > valid certification path to requested target
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Wed, Feb 14, 2024 at 11:27 PM Wei ZHOU <
> > >> > > > [email protected]
> > >> > > > > >
> > >> > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Can you try 1.27.8 or 1.28.4 on
> > >> > > > > > > https://download.cloudstack.org/cks/
> > >> > > > > > > > ?
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > -Wei
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > 在 2024年2月15日星期四，Wally B <[email protected]>
> 写道：
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > Hello Everyone!
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > We are currently attempting to deploy k8s
> clusters
> > >> and
> > >> > > are
> > >> > > > > > > running
> > >> > > > > > > > > into
> > >> > > > > > > > > > > > issues with the deployment.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Current CS Environment:
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > CloudStack Verison: 4.19.0 (Same issue before we
> > >> > upgraded
> > >> > > > > from
> > >> > > > > > > > > 4.18.1).
> > >> > > > > > > > > > > > Hypervisor Type: Ubuntu 20.04.03 KVM
> > >> > > > > > > > > > > > Attempted K8s Bins: 1.23.3, 1.27.3
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > ======== ISSUE =========
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > For some reason when we attempt the cluster
> > >> > provisioning
> > >> > > > all
> > >> > > > > of
> > >> > > > > > > the
> > >> > > > > > > > > VMs
> > >> > > > > > > > > > > > start up, SSH Keys are installed, but then at
> > least
> > >> 1,
> > >> > > > > > sometimes
> > >> > > > > > > 2
> > >> > > > > > > > of
> > >> > > > > > > > > > the
> > >> > > > > > > > > > > > VMs (control and/or worker) we get:
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > [FAILED] Failed to start
> > deploy-kube-system.service.
> > >> > > > > > > > > > > > [FAILED] Failed to start Execute cloud
> user/final
> > >> > > scripts.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > The Cloudstack UI just says:
> > >> > > > > > > > > > > > Create Kubernetes cluster test-cluster in
> progress
> > >> > > > > > > > > > > > for about an hour (I assume this is the 3600
> > second
> > >> > > > timeout)
> > >> > > > > > and
> > >> > > > > > > > then
> > >> > > > > > > > > > > > fails.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > In the users event log it stays on:
> > >> > > > > > > > > > > > INFO KUBERNETES.CLUSTER.CREATE
> > >> > > > > > > > > > > > Scheduled
> > >> > > > > > > > > > > > Creating Kubernetes cluster. Cluster Id: XXX
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > I can ssh into the VMs with their assigned
> private
> > >> > keys.
> > >> > > I
> > >> > > > > > > > attempted
> > >> > > > > > > > > to
> > >> > > > > > > > > > > run
> > >> > > > > > > > > > > > the deploy-kube-system script but it just says
> > >> already
> > >> > > > > > > provisioned!
> > >> > > > > > > > > I'm
> > >> > > > > > > > > > > not
> > >> > > > > > > > > > > > sure how I would Execute cloud user/final
> scripts.
> > >> If I
> > >> > > > > attempt
> > >> > > > > > > to
> > >> > > > > > > > > stop
> > >> > > > > > > > > > > the
> > >> > > > > > > > > > > > cluster and start it again nothing seems to
> > change.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Any help would be appreciated, I can provide any
> > >> > details
> > >> > > as
> > >> > > > > > they
> > >> > > > > > > > are
> > >> > > > > > > > > > > > needed!
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > Thanks!
> > >> > > > > > > > > > > > Wally
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Kubernetes Clusters Failing to Start 4.19.0

Reply via email to