Re: Kubernetes Clusters Failing to Start 4.19.0

Wally B Thu, 15 Feb 2024 07:40:48 -0800

As a quick add-on. After running those commands and getting the kubectl
commands working the Error in the management log is


tail -f /var/log/cloudstack/management/management-server.log | grep ERROR

2024-02-15 14:09:41,124 ERROR [c.c.k.c.a.KubernetesClusterActionWorker]
(API-Job-Executor-4:ctx-29ed2b8e job-12348 ctx-3355553d) (logid:ae448a2e)
Failed to setup Kubernetes cluster : pz-dev-k8s-ncus-00001 in usable state
as unable to access control node VMs of the cluster

2024-02-15 14:09:41,129 ERROR [c.c.a.ApiAsyncJobDispatcher]
(API-Job-Executor-4:ctx-29ed2b8e job-12348) (logid:ae448a2e) Unexpected
exception while executing
org.apache.cloudstack.api.command.user.kubernetes.cluster.CreateKubernetesClusterCmd

2024-02-15 14:33:01,117 ERROR [c.c.k.c.a.KubernetesClusterActionWorker]
(API-Job-Executor-17:ctx-0685d548 job-12552 ctx-997de847) (logid:fda8fc82)
Failed to setup Kubernetes cluster : pz-dev-k8s-ncus-00001 in usable state
as unable to access control node VMs of the cluster


did a quick test-netconnection from my pc to the control node and got



Test-NetConnection 99.xx.xx.xxx -p 6443



                                     ComputerName     :   99.xx.xx.xxx
RemoteAddress    :   99.xx.xx.xxx
RemotePort       : 6443
InterfaceAlias   : Ethernet
SourceAddress    : xxx.xxx.xxx.xxx
TcpTestSucceeded : True


So I did a test to see If I could get it from my Management hosts (on the
same public ip range as the Virtual Router Public IP). and I got a TTL
Expired.




To wrap it up there were 3 issues.


1. Needed to delete and re-provision the Secondary Storage System Virtual
Machine after upgrading from 4.18.1 to 4.19.0
2. Needed to fix additional control nodes not getting the kubeadm.conf
copied correctly (Wei PR)
3. Needed to fix some routing on our end since were were bouncing between
our L3 TOR ->Firewall <- ISP Routers

Thanks again for all the help, everyone!
Wally

On Thu, Feb 15, 2024 at 7:24 AM Wally B <wvbauman...@gmail.com> wrote:

> Thanks Wei ZHOU!
>
> That fixed the kubectl command issue but the cluster still just sits at
>
> Create Kubernetes cluster k8s-cluster-1 in progress
>
> Maybe this is just a UI issue? Unfortunately If I stop the k8s cluster
> after it errors out it just stays in the error state.
>
> 1. Click Stop Kubernetes cluster
> 2. UI Says it successfully stopped.
> 3. Try to Start the Cluster but the power button just says  Stop
> Kubernetes cluster and the UI Status stays in the error state.
>
>
> On Thu, Feb 15, 2024 at 7:02 AM Wei ZHOU <ustcweiz...@gmail.com> wrote:
>
>> Hi,
>>
>> Please run the following commands as root:
>>
>> mkdir -p /root/.kube
>> cp -i /etc/kubernetes/admin.conf /root/.kube/config
>>
>> After then the kubectl commands should work
>>
>> -Wei
>>
>> On Thu, 15 Feb 2024 at 13:53, Wally B <wvbauman...@gmail.com> wrote:
>>
>> > What command do you suggest I run?
>> >
>> > kubeconfig returns command not found
>> >
>> > on your PR I see
>> >
>> > kubeadm join is being called out as well but I wanted to verify what you
>> > wanted me to test first.
>> >
>> > On Thu, Feb 15, 2024 at 2:41 AM Wei ZHOU <ustcweiz...@gmail.com> wrote:
>> >
>> > > Hi Wally,
>> > >
>> > > I think the cluster is working fine.
>> > > The kubeconfig is missing in extra nodes. I have just created a PR for
>> > it:
>> > > https://github.com/apache/cloudstack/pull/8658
>> > > You can run the command on the control nodes which should fix the
>> > problem.
>> > >
>> > >
>> > > -Wei
>> > >
>> > > On Thu, 15 Feb 2024 at 09:31, Wally B <wvbauman...@gmail.com> wrote:
>> > >
>> > > > 3 Nodes
>> > > >
>> > > > Control 1 -- No Errors
>> > > >
>> > > > kubectl get nodes
>> > > > NAME                                        STATUS   ROLES
>> >  AGE
>> > > >  VERSION
>> > > > pz-dev-k8s-ncus-00001-control-18dabdb141b   Ready    control-plane
>> >  2m6s
>> > > > v1.28.4
>> > > > pz-dev-k8s-ncus-00001-control-18dabdb6ad6   Ready    control-plane
>> >  107s
>> > > > v1.28.4
>> > > > pz-dev-k8s-ncus-00001-control-18dabdbc0a8   Ready    control-plane
>> >  108s
>> > > > v1.28.4
>> > > > pz-dev-k8s-ncus-00001-node-18dabdc1644      Ready    <none>
>> > 115s
>> > > > v1.28.4
>> > > > pz-dev-k8s-ncus-00001-node-18dabdc6c16      Ready    <none>
>> > 115s
>> > > > v1.28.4
>> > > >
>> > > >
>> > > > kubectl get pods --all-namespaces
>> > > > NAMESPACE              NAME
>> > > >                READY   STATUS    RESTARTS        AGE
>> > > > kube-system            coredns-5dd5756b68-g84vk
>> > > >                1/1     Running   0               2m46s
>> > > > kube-system            coredns-5dd5756b68-kf92x
>> > > >                1/1     Running   0               2m46s
>> > > > kube-system
>> etcd-pz-dev-k8s-ncus-00001-control-18dabdb141b
>> > > >                1/1     Running   0               2m50s
>> > > > kube-system
>> etcd-pz-dev-k8s-ncus-00001-control-18dabdb6ad6
>> > > >                1/1     Running   0               2m16s
>> > > > kube-system
>> etcd-pz-dev-k8s-ncus-00001-control-18dabdbc0a8
>> > > >                1/1     Running   0               2m37s
>> > > > kube-system
>> > > >  kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabdb141b
>> > 1/1
>> > > >   Running   0               2m52s
>> > > > kube-system
>> > > >  kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabdb6ad6
>> > 1/1
>> > > >   Running   1 (2m16s ago)   2m15s
>> > > > kube-system
>> > > >  kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabdbc0a8
>> > 1/1
>> > > >   Running   0               2m37s
>> > > > kube-system
>> > > >  kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabdb141b
>> >  1/1
>> > > >   Running   1 (2m25s ago)   2m51s
>> > > > kube-system
>> > > >  kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabdb6ad6
>> >  1/1
>> > > >   Running   0               2m18s
>> > > > kube-system
>> > > >  kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabdbc0a8
>> >  1/1
>> > > >   Running   0               2m37s
>> > > > kube-system            kube-proxy-445qx
>> > > >                1/1     Running   0               2m37s
>> > > > kube-system            kube-proxy-8swdg
>> > > >                1/1     Running   0               2m2s
>> > > > kube-system            kube-proxy-bl9rx
>> > > >                1/1     Running   0               2m47s
>> > > > kube-system            kube-proxy-pv8gj
>> > > >                1/1     Running   0               2m43s
>> > > > kube-system            kube-proxy-v7cw2
>> > > >                1/1     Running   0               2m43s
>> > > > kube-system
>> > > >  kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabdb141b
>> > 1/1
>> > > >   Running   1 (2m22s ago)   2m50s
>> > > > kube-system
>> > > >  kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabdb6ad6
>> > 1/1
>> > > >   Running   0               2m15s
>> > > > kube-system
>> > > >  kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabdbc0a8
>> > 1/1
>> > > >   Running   0               2m37s
>> > > > kube-system            weave-net-8dvl5
>> > > >                 2/2     Running   0               2m37s
>> > > > kube-system            weave-net-c54bz
>> > > >                 2/2     Running   0               2m43s
>> > > > kube-system            weave-net-lv8l4
>> > > >                 2/2     Running   1 (2m42s ago)   2m47s
>> > > > kube-system            weave-net-vg6td
>> > > >                 2/2     Running   0               2m2s
>> > > > kube-system            weave-net-vq9s4
>> > > >                 2/2     Running   0               2m43s
>> > > > kubernetes-dashboard   dashboard-metrics-scraper-5657497c4c-4k886
>> > > >                1/1     Running   0               2m46s
>> > > > kubernetes-dashboard   kubernetes-dashboard-5b749d9495-jpbxl
>> > > >                 1/1     Running   1 (2m22s ago)   2m46s
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > Control 2: Errors at the CLI
>> > > > Failed to start Execute cloud user/final scripts.
>> > > >
>> > > > kubectl get nodes
>> > > > E0215 08:27:07.797825    2772 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:27:07.798759    2772 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:27:07.801039    2772 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:27:07.801977    2772 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:27:07.804029    2772 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > The connection to the server localhost:8080 was refused - did you
>> > specify
>> > > > the right host or port?
>> > > >
>> > > > kubectl get pods --all-namespaces
>> > > > E0215 08:29:41.818452    2811 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:29:41.819935    2811 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:29:41.820883    2811 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:29:41.822680    2811 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:29:41.823571    2811 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > The connection to the server localhost:8080 was refused - did you
>> > specify
>> > > > the right host or port?
>> > > >
>> > > > Ping Google: Success
>> > > > Ping Control Node 1: Success
>> > > >
>> > > >
>> > > > Control 3: Errors at the CLI
>> > > > Failed to start Execute cloud user/final scripts.
>> > > > Failed to start deploy-kube-system.service.
>> > > >
>> > > > kubectl get nodes
>> > > > E0215 08:27:15.057313    2697 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:27:15.058538    2697 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:27:15.059260    2697 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:27:15.061599    2697 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:27:15.062029    2697 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > The connection to the server localhost:8080 was refused - did you
>> > specify
>> > > > the right host or port?
>> > > >
>> > > >
>> > > > kubectl get pods --all-namespaces
>> > > > E0215 08:29:57.108716    2736 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:29:57.109533    2736 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:29:57.111372    2736 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:29:57.112074    2736 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > E0215 08:29:57.113956    2736 memcache.go:265] couldn't get current
>> > > server
>> > > > API group list: Get "http://localhost:8080/api?timeout=32s": dial
>> tcp
>> > > > 127.0.0.1:8080: connect: connection refused
>> > > > The connection to the server localhost:8080 was refused - did you
>> > specify
>> > > > the right host or port?
>> > > >
>> > > >
>> > > > Ping Google: Success
>> > > > Ping Control Node 1: Success
>> > > >
>> > > >
>> > > > On Thu, Feb 15, 2024 at 2:17 AM Wei ZHOU <ustcweiz...@gmail.com>
>> > wrote:
>> > > >
>> > > > > Can you try with 3 control nodes ?
>> > > > >
>> > > > > -Wei
>> > > > >
>> > > > > On Thu, 15 Feb 2024 at 09:13, Wally B <wvbauman...@gmail.com>
>> wrote:
>> > > > >
>> > > > > > - zone type :
>> > > > > >         Core
>> > > > > > - network type:
>> > > > > >         Advanced
>> > > > > >         Isolated Network inside a Redundant VPC (same results in
>> > just
>> > > > an
>> > > > > > Isolated network without VPC)
>> > > > > > - number of control nodes:
>> > > > > >         2 Control Nodes (HA Cluster)
>> > > > > >
>> > > > > > We were able to deploy k8s in the past, not sure what changed.
>> > > > > >
>> > > > > > Thanks!
>> > > > > > -Wally
>> > > > > >
>> > > > > > On Thu, Feb 15, 2024 at 2:04 AM Wei ZHOU <ustcweiz...@gmail.com
>> >
>> > > > wrote:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > can you share
>> > > > > > > - zone type
>> > > > > > > - network type
>> > > > > > > - number of control nodes
>> > > > > > >
>> > > > > > >
>> > > > > > > -Wei
>> > > > > > >
>> > > > > > > On Thu, 15 Feb 2024 at 08:52, Wally B <wvbauman...@gmail.com>
>> > > wrote:
>> > > > > > >
>> > > > > > > > So
>> > > > > > > >
>> > > > > > > > Recreating the Sec Storage VM Fixed the Cert issue and I was
>> > able
>> > > > to
>> > > > > > > > install K8s 1.28.4 Binaries. --- THANKS Wei ZHOU !
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Im still getting
>> > > > > > > >
>> > > > > > > > [FAILED] Failed to start Execute cloud user/final scripts.
>> > > > > > > >
>> > > > > > > > on 1 control and 1 worker.
>> > > > > > > >
>> > > > > > > > *Control 1 --  pz-dev-k8s-ncus-00001-control-18dabaf66c1  --
>> > > :*
>> > > > No
>> > > > > > > > errors at the CLI
>> > > > > > > >
>> > > > > > > > kubectl get nodes
>> > > > > > > > NAME                                        STATUS   ROLES
>> > > > > >  AGE
>> > > > > > > >   VERSION
>> > > > > > > > pz-dev-k8s-ncus-00001-control-18dabaf0edb   Ready
>> > > control-plane
>> > > > > >  5m2s
>> > > > > > > >  v1.28.4
>> > > > > > > > pz-dev-k8s-ncus-00001-control-18dabaf66c1   Ready
>> > > control-plane
>> > > > > > >  4m44s
>> > > > > > > >   v1.28.4
>> > > > > > > > pz-dev-k8s-ncus-00001-node-18dabafb0bd      Ready    <none>
>> > > > > > > 4m47s
>> > > > > > > >   v1.28.4
>> > > > > > > > pz-dev-k8s-ncus-00001-node-18dabb006bc      Ready    <none>
>> > > > > > > 4m47s
>> > > > > > > >   v1.28.4
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > kubectl get pods --all-namespaces
>> > > > > > > > NAMESPACE              NAME
>> > > > > > > >                READY   STATUS    RESTARTS        AGE
>> > > > > > > > kube-system            coredns-5dd5756b68-295gb
>> > > > > > > >                1/1     Running   0               5m32s
>> > > > > > > > kube-system            coredns-5dd5756b68-cdwvw
>> > > > > > > >                1/1     Running   0               5m33s
>> > > > > > > > kube-system
>> > > > etcd-pz-dev-k8s-ncus-00001-control-18dabaf0edb
>> > > > > > > >                1/1     Running   0               5m36s
>> > > > > > > > kube-system
>> > > > etcd-pz-dev-k8s-ncus-00001-control-18dabaf66c1
>> > > > > > > >                1/1     Running   0               5m23s
>> > > > > > > > kube-system
>> > > > > > > >  kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabaf0edb
>> > > > > > 1/1
>> > > > > > > >   Running   0               5m36s
>> > > > > > > > kube-system
>> > > > > > > >  kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabaf66c1
>> > > > > > 1/1
>> > > > > > > >   Running   0               5m23s
>> > > > > > > > kube-system
>> > > > > > > >
>> > > kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabaf0edb
>> > > > > >  1/1
>> > > > > > > >   Running   1 (5m13s ago)   5m36s
>> > > > > > > > kube-system
>> > > > > > > >
>> > > kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabaf66c1
>> > > > > >  1/1
>> > > > > > > >   Running   0               5m23s
>> > > > > > > > kube-system            kube-proxy-2m8zb
>> > > > > > > >                1/1     Running   0               5m26s
>> > > > > > > > kube-system            kube-proxy-cwpjg
>> > > > > > > >                1/1     Running   0               5m33s
>> > > > > > > > kube-system            kube-proxy-l2vbf
>> > > > > > > >                1/1     Running   0               5m26s
>> > > > > > > > kube-system            kube-proxy-qhlqt
>> > > > > > > >                1/1     Running   0               5m23s
>> > > > > > > > kube-system
>> > > > > > > >  kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabaf0edb
>> > > > > > 1/1
>> > > > > > > >   Running   1 (5m8s ago)    5m36s
>> > > > > > > > kube-system
>> > > > > > > >  kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabaf66c1
>> > > > > > 1/1
>> > > > > > > >   Running   0               5m23s
>> > > > > > > > kube-system            weave-net-5cs26
>> > > > > > > >                 2/2     Running   1 (5m9s ago)    5m26s
>> > > > > > > > kube-system            weave-net-9zqrw
>> > > > > > > >                 2/2     Running   1 (5m28s ago)   5m33s
>> > > > > > > > kube-system            weave-net-fcwtr
>> > > > > > > >                 2/2     Running   0               5m23s
>> > > > > > > > kube-system            weave-net-lh2dh
>> > > > > > > >                 2/2     Running   1 (4m41s ago)   5m26s
>> > > > > > > > kubernetes-dashboard
>> >  dashboard-metrics-scraper-5657497c4c-r284t
>> > > > > > > >                1/1     Running   0               5m32s
>> > > > > > > > kubernetes-dashboard   kubernetes-dashboard-5b749d9495-vtwdd
>> > > > > > > >                 1/1     Running   0               5m32s
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > *Control 2 ---  pz-dev-k8s-ncus-00001-control-18dabaf66c1
>>  :*
>> > > > > > [FAILED]
>> > > > > > > > Failed to start Execute cloud user/final scripts.
>> > > > > > > >
>> > > > > > > > kubectl get nodes
>> > > > > > > > E0215 07:38:33.314561    2643 memcache.go:265] couldn't get
>> > > current
>> > > > > > > server
>> > > > > > > > API group list: Get "http://localhost:8080/api?timeout=32s
>> ":
>> > > dial
>> > > > > tcp
>> > > > > > > > 127.0.0.1:8080: connect: connection refused
>> > > > > > > > E0215 07:38:33.316751    2643 memcache.go:265] couldn't get
>> > > current
>> > > > > > > server
>> > > > > > > > API group list: Get "http://localhost:8080/api?timeout=32s
>> ":
>> > > dial
>> > > > > tcp
>> > > > > > > > 127.0.0.1:8080: connect: connection refused
>> > > > > > > > E0215 07:38:33.317754    2643 memcache.go:265] couldn't get
>> > > current
>> > > > > > > server
>> > > > > > > > API group list: Get "http://localhost:8080/api?timeout=32s
>> ":
>> > > dial
>> > > > > tcp
>> > > > > > > > 127.0.0.1:8080: connect: connection refused
>> > > > > > > > E0215 07:38:33.319181    2643 memcache.go:265] couldn't get
>> > > current
>> > > > > > > server
>> > > > > > > > API group list: Get "http://localhost:8080/api?timeout=32s
>> ":
>> > > dial
>> > > > > tcp
>> > > > > > > > 127.0.0.1:8080: connect: connection refused
>> > > > > > > > E0215 07:38:33.319975    2643 memcache.go:265] couldn't get
>> > > current
>> > > > > > > server
>> > > > > > > > API group list: Get "http://localhost:8080/api?timeout=32s
>> ":
>> > > dial
>> > > > > tcp
>> > > > > > > > 127.0.0.1:8080: connect: connection refused
>> > > > > > > > The connection to the server localhost:8080 was refused -
>> did
>> > you
>> > > > > > specify
>> > > > > > > > the right host or port?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > kubectl get pods --all-namespaces
>> > > > > > > > E0215 07:42:23.786704    2700 memcache.go:265] couldn't get
>> > > current
>> > > > > > > server
>> > > > > > > > API group list: Get "http://localhost:8080/api?timeout=32s
>> ":
>> > > dial
>> > > > > tcp
>> > > > > > > > 127.0.0.1:8080: connect: connection refused
>> > > > > > > > E0215 07:42:23.787455    2700 memcache.go:265] couldn't get
>> > > current
>> > > > > > > server
>> > > > > > > > API group list: Get "http://localhost:8080/api?timeout=32s
>> ":
>> > > dial
>> > > > > tcp
>> > > > > > > > 127.0.0.1:8080: connect: connection refused
>> > > > > > > > E0215 07:42:23.789529    2700 memcache.go:265] couldn't get
>> > > current
>> > > > > > > server
>> > > > > > > > API group list: Get "http://localhost:8080/api?timeout=32s
>> ":
>> > > dial
>> > > > > tcp
>> > > > > > > > 127.0.0.1:8080: connect: connection refused
>> > > > > > > > E0215 07:42:23.790051    2700 memcache.go:265] couldn't get
>> > > current
>> > > > > > > server
>> > > > > > > > API group list: Get "http://localhost:8080/api?timeout=32s
>> ":
>> > > dial
>> > > > > tcp
>> > > > > > > > 127.0.0.1:8080: connect: connection refused
>> > > > > > > > E0215 07:42:23.791742    2700 memcache.go:265] couldn't get
>> > > current
>> > > > > > > server
>> > > > > > > > API group list: Get "http://localhost:8080/api?timeout=32s
>> ":
>> > > dial
>> > > > > tcp
>> > > > > > > > 127.0.0.1:8080: connect: connection refused
>> > > > > > > > The connection to the server localhost:8080 was refused -
>> did
>> > you
>> > > > > > specify
>> > > > > > > > the right host or port?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > */var/log/daemon.log*
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1KuIx0jI4TuAXPgACY3rJQz3L2B8AjeqOL0Fm5r4YF5M/edit?usp=sharing
>> > > > > > > >
>> > > > > > > > */var/log/messages*
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/15xet6kxI9rdgi4RkIHqtn-Wywph4h1Coyt_cyrJYkv4/edit?usp=sharing
>> > > > > > > >
>> > > > > > > > On Thu, Feb 15, 2024 at 1:21 AM Wei ZHOU <
>> > ustcweiz...@gmail.com>
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Destroy ssvm and retry when new ssvm is Up  ?
>> > > > > > > > >
>> > > > > > > > > -Wei
>> > > > > > > > >
>> > > > > > > > > 在 2024年2月15日星期四，Wally B <wvbauman...@gmail.com> 写道：
>> > > > > > > > >
>> > > > > > > > > > Super Weird. I have two other versions added
>> successfully
>> > but
>> > > > now
>> > > > > > > when
>> > > > > > > > I
>> > > > > > > > > > try to add an ISO/version I get the following on the
>> > > management
>> > > > > > host.
>> > > > > > > > > This
>> > > > > > > > > > is the first time I've tried adding a K8s version since
>> > > 4.18.0
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > tail -f
>> > /var/log/cloudstack/management/management-server.log
>> > > |
>> > > > > grep
>> > > > > > > > ERROR
>> > > > > > > > > >
>> > > > > > > > > > 2024-02-15 06:26:18,900 DEBUG [c.c.a.t.Request]
>> > > > > > > > > > (AgentManager-Handler-5:null) (logid:) Seq
>> > > > > 48-6373437897659383816:
>> > > > > > > > > > Processing:  { Ans: , MgmtId: 15643723020152, via: 48,
>> Ver:
>> > > v1,
>> > > > > > > Flags:
>> > > > > > > > > 10,
>> > > > > > > > > > [{"com.cloud.agent.api.storage.DownloadAnswer":{"
>> > > > > > > > > > jobId":"39d72d08-ab48-47dd-b09a-eee3ed816f4d","
>> > > > > > > > > > downloadPct":"0","errorString":"PKIX
>> > > > > > > > > > path building failed:
>> > > > > > > > > >
>> sun.security.provider.certpath.SunCertPathBuilderException:
>> > > > > unable
>> > > > > > to
>> > > > > > > > > find
>> > > > > > > > > > valid certification path to requested
>> > > > > > > > > > target","downloadStatus":"DOWNLOAD_ERROR","downloadPath"
>> > > > > > > > > > :"/mnt/SecStorage/73075a0a-38a1-3631-8170-8887c04f6073/
>> > > > > > > > > > template/tmpl/1/223/dnld9180711723601784047tmp_","
>> > > > > > > > > > installPath":"template/tmpl/1/223","templateSize":"(0
>> > > > > > > > > > bytes) 0","templatePhySicalSize":"(0 bytes)
>> > > > > > > > > > 0","checkSum":"4dfb9d8be2191bc8bc4b89d78795a5
>> > > > > > > > > > b","result":"true","details":"PKIX
>> > > > > > > > > > path building failed:
>> > > > > > > > > >
>> sun.security.provider.certpath.SunCertPathBuilderException:
>> > > > > unable
>> > > > > > to
>> > > > > > > > > find
>> > > > > > > > > > valid certification path to requested
>> > > > > > > > > > target","wait":"0","bypassHostMaintenance":"false"}}] }
>> > > > > > > > > >
>> > > > > > > > > > 2024-02-15 06:26:18,937 ERROR
>> > > > > [o.a.c.s.i.BaseImageStoreDriverImpl]
>> > > > > > > > > > (RemoteHostEndPoint-5:ctx-55063062) (logid:e21177cb)
>> Failed
>> > > to
>> > > > > > > register
>> > > > > > > > > > template: b6e79c5a-38d4-4cf5-8606-e6f209b6b4c2 with
>> error:
>> > > PKIX
>> > > > > > path
>> > > > > > > > > > building failed:
>> > > > > > > > > >
>> sun.security.provider.certpath.SunCertPathBuilderException:
>> > > > > unable
>> > > > > > to
>> > > > > > > > > find
>> > > > > > > > > > valid certification path to requested target
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Wed, Feb 14, 2024 at 11:27 PM Wei ZHOU <
>> > > > ustcweiz...@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Can you try 1.27.8 or 1.28.4 on
>> > > > > > > https://download.cloudstack.org/cks/
>> > > > > > > > ?
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > -Wei
>> > > > > > > > > > >
>> > > > > > > > > > > 在 2024年2月15日星期四，Wally B <wvbauman...@gmail.com> 写道：
>> > > > > > > > > > >
>> > > > > > > > > > > > Hello Everyone!
>> > > > > > > > > > > >
>> > > > > > > > > > > > We are currently attempting to deploy k8s clusters
>> and
>> > > are
>> > > > > > > running
>> > > > > > > > > into
>> > > > > > > > > > > > issues with the deployment.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Current CS Environment:
>> > > > > > > > > > > >
>> > > > > > > > > > > > CloudStack Verison: 4.19.0 (Same issue before we
>> > upgraded
>> > > > > from
>> > > > > > > > > 4.18.1).
>> > > > > > > > > > > > Hypervisor Type: Ubuntu 20.04.03 KVM
>> > > > > > > > > > > > Attempted K8s Bins: 1.23.3, 1.27.3
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > ======== ISSUE =========
>> > > > > > > > > > > >
>> > > > > > > > > > > > For some reason when we attempt the cluster
>> > provisioning
>> > > > all
>> > > > > of
>> > > > > > > the
>> > > > > > > > > VMs
>> > > > > > > > > > > > start up, SSH Keys are installed, but then at least
>> 1,
>> > > > > > sometimes
>> > > > > > > 2
>> > > > > > > > of
>> > > > > > > > > > the
>> > > > > > > > > > > > VMs (control and/or worker) we get:
>> > > > > > > > > > > >
>> > > > > > > > > > > > [FAILED] Failed to start deploy-kube-system.service.
>> > > > > > > > > > > > [FAILED] Failed to start Execute cloud user/final
>> > > scripts.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The Cloudstack UI just says:
>> > > > > > > > > > > > Create Kubernetes cluster test-cluster in progress
>> > > > > > > > > > > > for about an hour (I assume this is the 3600 second
>> > > > timeout)
>> > > > > > and
>> > > > > > > > then
>> > > > > > > > > > > > fails.
>> > > > > > > > > > > >
>> > > > > > > > > > > > In the users event log it stays on:
>> > > > > > > > > > > > INFO KUBERNETES.CLUSTER.CREATE
>> > > > > > > > > > > > Scheduled
>> > > > > > > > > > > > Creating Kubernetes cluster. Cluster Id: XXX
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > I can ssh into the VMs with their assigned private
>> > keys.
>> > > I
>> > > > > > > > attempted
>> > > > > > > > > to
>> > > > > > > > > > > run
>> > > > > > > > > > > > the deploy-kube-system script but it just says
>> already
>> > > > > > > provisioned!
>> > > > > > > > > I'm
>> > > > > > > > > > > not
>> > > > > > > > > > > > sure how I would Execute cloud user/final scripts.
>> If I
>> > > > > attempt
>> > > > > > > to
>> > > > > > > > > stop
>> > > > > > > > > > > the
>> > > > > > > > > > > > cluster and start it again nothing seems to change.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Any help would be appreciated, I can provide any
>> > details
>> > > as
>> > > > > > they
>> > > > > > > > are
>> > > > > > > > > > > > needed!
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks!
>> > > > > > > > > > > > Wally
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Kubernetes Clusters Failing to Start 4.19.0

Reply via email to