So Recreating the Sec Storage VM Fixed the Cert issue and I was able to install K8s 1.28.4 Binaries. --- THANKS Wei ZHOU !
Im still getting [FAILED] Failed to start Execute cloud user/final scripts. on 1 control and 1 worker. *Control 1 -- pz-dev-k8s-ncus-00001-control-18dabaf66c1 -- :* No errors at the CLI kubectl get nodes NAME STATUS ROLES AGE VERSION pz-dev-k8s-ncus-00001-control-18dabaf0edb Ready control-plane 5m2s v1.28.4 pz-dev-k8s-ncus-00001-control-18dabaf66c1 Ready control-plane 4m44s v1.28.4 pz-dev-k8s-ncus-00001-node-18dabafb0bd Ready <none> 4m47s v1.28.4 pz-dev-k8s-ncus-00001-node-18dabb006bc Ready <none> 4m47s v1.28.4 kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-5dd5756b68-295gb 1/1 Running 0 5m32s kube-system coredns-5dd5756b68-cdwvw 1/1 Running 0 5m33s kube-system etcd-pz-dev-k8s-ncus-00001-control-18dabaf0edb 1/1 Running 0 5m36s kube-system etcd-pz-dev-k8s-ncus-00001-control-18dabaf66c1 1/1 Running 0 5m23s kube-system kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabaf0edb 1/1 Running 0 5m36s kube-system kube-apiserver-pz-dev-k8s-ncus-00001-control-18dabaf66c1 1/1 Running 0 5m23s kube-system kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabaf0edb 1/1 Running 1 (5m13s ago) 5m36s kube-system kube-controller-manager-pz-dev-k8s-ncus-00001-control-18dabaf66c1 1/1 Running 0 5m23s kube-system kube-proxy-2m8zb 1/1 Running 0 5m26s kube-system kube-proxy-cwpjg 1/1 Running 0 5m33s kube-system kube-proxy-l2vbf 1/1 Running 0 5m26s kube-system kube-proxy-qhlqt 1/1 Running 0 5m23s kube-system kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabaf0edb 1/1 Running 1 (5m8s ago) 5m36s kube-system kube-scheduler-pz-dev-k8s-ncus-00001-control-18dabaf66c1 1/1 Running 0 5m23s kube-system weave-net-5cs26 2/2 Running 1 (5m9s ago) 5m26s kube-system weave-net-9zqrw 2/2 Running 1 (5m28s ago) 5m33s kube-system weave-net-fcwtr 2/2 Running 0 5m23s kube-system weave-net-lh2dh 2/2 Running 1 (4m41s ago) 5m26s kubernetes-dashboard dashboard-metrics-scraper-5657497c4c-r284t 1/1 Running 0 5m32s kubernetes-dashboard kubernetes-dashboard-5b749d9495-vtwdd 1/1 Running 0 5m32s *Control 2 --- pz-dev-k8s-ncus-00001-control-18dabaf66c1 :* [FAILED] Failed to start Execute cloud user/final scripts. kubectl get nodes E0215 07:38:33.314561 2643 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused E0215 07:38:33.316751 2643 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused E0215 07:38:33.317754 2643 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused E0215 07:38:33.319181 2643 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused E0215 07:38:33.319975 2643 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused The connection to the server localhost:8080 was refused - did you specify the right host or port? kubectl get pods --all-namespaces E0215 07:42:23.786704 2700 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused E0215 07:42:23.787455 2700 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused E0215 07:42:23.789529 2700 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused E0215 07:42:23.790051 2700 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused E0215 07:42:23.791742 2700 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused The connection to the server localhost:8080 was refused - did you specify the right host or port? */var/log/daemon.log* https://docs.google.com/document/d/1KuIx0jI4TuAXPgACY3rJQz3L2B8AjeqOL0Fm5r4YF5M/edit?usp=sharing */var/log/messages* https://docs.google.com/document/d/15xet6kxI9rdgi4RkIHqtn-Wywph4h1Coyt_cyrJYkv4/edit?usp=sharing On Thu, Feb 15, 2024 at 1:21 AM Wei ZHOU <ustcweiz...@gmail.com> wrote: > Destroy ssvm and retry when new ssvm is Up ? > > -Wei > > 在 2024年2月15日星期四,Wally B <wvbauman...@gmail.com> 写道: > > > Super Weird. I have two other versions added successfully but now when I > > try to add an ISO/version I get the following on the management host. > This > > is the first time I've tried adding a K8s version since 4.18.0 > > > > > > tail -f /var/log/cloudstack/management/management-server.log | grep ERROR > > > > 2024-02-15 06:26:18,900 DEBUG [c.c.a.t.Request] > > (AgentManager-Handler-5:null) (logid:) Seq 48-6373437897659383816: > > Processing: { Ans: , MgmtId: 15643723020152, via: 48, Ver: v1, Flags: > 10, > > [{"com.cloud.agent.api.storage.DownloadAnswer":{" > > jobId":"39d72d08-ab48-47dd-b09a-eee3ed816f4d"," > > downloadPct":"0","errorString":"PKIX > > path building failed: > > sun.security.provider.certpath.SunCertPathBuilderException: unable to > find > > valid certification path to requested > > target","downloadStatus":"DOWNLOAD_ERROR","downloadPath" > > :"/mnt/SecStorage/73075a0a-38a1-3631-8170-8887c04f6073/ > > template/tmpl/1/223/dnld9180711723601784047tmp_"," > > installPath":"template/tmpl/1/223","templateSize":"(0 > > bytes) 0","templatePhySicalSize":"(0 bytes) > > 0","checkSum":"4dfb9d8be2191bc8bc4b89d78795a5 > > b","result":"true","details":"PKIX > > path building failed: > > sun.security.provider.certpath.SunCertPathBuilderException: unable to > find > > valid certification path to requested > > target","wait":"0","bypassHostMaintenance":"false"}}] } > > > > 2024-02-15 06:26:18,937 ERROR [o.a.c.s.i.BaseImageStoreDriverImpl] > > (RemoteHostEndPoint-5:ctx-55063062) (logid:e21177cb) Failed to register > > template: b6e79c5a-38d4-4cf5-8606-e6f209b6b4c2 with error: PKIX path > > building failed: > > sun.security.provider.certpath.SunCertPathBuilderException: unable to > find > > valid certification path to requested target > > > > > > > > > > On Wed, Feb 14, 2024 at 11:27 PM Wei ZHOU <ustcweiz...@gmail.com> wrote: > > > > > Can you try 1.27.8 or 1.28.4 on https://download.cloudstack.org/cks/ ? > > > > > > > > > -Wei > > > > > > 在 2024年2月15日星期四,Wally B <wvbauman...@gmail.com> 写道: > > > > > > > Hello Everyone! > > > > > > > > We are currently attempting to deploy k8s clusters and are running > into > > > > issues with the deployment. > > > > > > > > > > > > Current CS Environment: > > > > > > > > CloudStack Verison: 4.19.0 (Same issue before we upgraded from > 4.18.1). > > > > Hypervisor Type: Ubuntu 20.04.03 KVM > > > > Attempted K8s Bins: 1.23.3, 1.27.3 > > > > > > > > > > > > > > > > ======== ISSUE ========= > > > > > > > > For some reason when we attempt the cluster provisioning all of the > VMs > > > > start up, SSH Keys are installed, but then at least 1, sometimes 2 of > > the > > > > VMs (control and/or worker) we get: > > > > > > > > [FAILED] Failed to start deploy-kube-system.service. > > > > [FAILED] Failed to start Execute cloud user/final scripts. > > > > > > > > The Cloudstack UI just says: > > > > Create Kubernetes cluster test-cluster in progress > > > > for about an hour (I assume this is the 3600 second timeout) and then > > > > fails. > > > > > > > > In the users event log it stays on: > > > > INFO KUBERNETES.CLUSTER.CREATE > > > > Scheduled > > > > Creating Kubernetes cluster. Cluster Id: XXX > > > > > > > > > > > > > > > > I can ssh into the VMs with their assigned private keys. I attempted > to > > > run > > > > the deploy-kube-system script but it just says already provisioned! > I'm > > > not > > > > sure how I would Execute cloud user/final scripts. If I attempt to > stop > > > the > > > > cluster and start it again nothing seems to change. > > > > > > > > > > > > > > > > Any help would be appreciated, I can provide any details as they are > > > > needed! > > > > > > > > Thanks! > > > > Wally > > > > > > > > > >