Dear community, after more than 2 months (with Casabalanca) APPC ansible server went to CrashLoopBackOff state. I tried to update its deployment without success. One day after, other APPC pods went to CrashLoopBackOff state too.
Now I am trying with a simple deployment that consists only of APPC service. Below is the state of pods from Kubernetes dashboard: Describing "dev-appc-appc-db-0": ubuntu@rancher:~/oom/kubernetes$ kubectl describe pod/dev-appc-appc-db-0 -n onap Name: dev-appc-appc-db-0 Namespace: onap Node: k8s-dev/10.0.0.31 Start Time: Wed, 27 Mar 2019 14:16:01 +0000 Labels: app=dev-appc-appc-db controller-revision-hash=dev-appc-appc-db-7cf88ff6b7 statefulset.kubernetes.io/pod-name=dev-appc-appc-db-0 Annotations: pod.alpha.kubernetes.io/initialized=true Status: Running IP: 10.42.5.166 Controlled By: StatefulSet/dev-appc-appc-db Init Containers: mariadb-galera-prepare: Container ID: docker://610ebd70b379ac6e7a223ba7f4dfbb7b6becf2a8965f0850ad8b9cf11f28b520 Image: nexus3.onap.org:10001/busybox Image ID: docker-pullable://nexus3.onap.org:10001/busybox@sha256:4415a904b1aca178c2450fd54928ab362825e863c0ad5452fd020e92f7a6a47e Port: <none> Host Port: <none> Command: sh -c chown -R 27:27 /var/lib/mysql State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 27 Mar 2019 14:16:14 +0000 Finished: Wed, 27 Mar 2019 14:16:14 +0000 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/lib/mysql from dev-appc-appc-db-data (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-sbhz4 (ro) Containers: appc-db: Container ID: docker://03cfd676d76fc852dd7c75fb795e0223c731b4f4a22ee903f9784df70737d080 Image: nexus3.onap.org:10001/adfinissygroup/k8s-mariadb-galera-centos:v002 Image ID: docker-pullable://nexus3.onap.org:10001/adfinissygroup/k8s-mariadb-galera-centos@sha256:fbcb842f30065ae94532cb1af9bb03cc6e2acaaf896d87d0ec38da7dd09a3dde Ports: 3306/TCP, 4444/TCP, 4567/TCP, 4568/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Wed, 27 Mar 2019 14:19:42 +0000 Finished: Wed, 27 Mar 2019 14:19:45 +0000 Ready: False Restart Count: 5 Liveness: exec [mysqladmin ping] delay=30s timeout=5s period=10s #success=1 #failure=3 Readiness: exec [/usr/share/container-scripts/mysql/readiness-probe.sh] delay=15s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAMESPACE: onap (v1:metadata.namespace) MYSQL_USER: my-user MYSQL_PASSWORD: <set to the key 'user-password' in secret 'dev-appc-appc-db'> Optional: false MYSQL_DATABASE: my-database MYSQL_ROOT_PASSWORD: <set to the key 'db-root-password' in secret 'dev-appc-appc-db'> Optional: false Mounts: /etc/localtime from localtime (ro) /var/lib/mysql from dev-appc-appc-db-data (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-sbhz4 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: dev-appc-appc-db-data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: dev-appc-appc-db-data-dev-appc-appc-db-0 ReadOnly: false localtime: Type: HostPath (bare host directory volume) Path: /etc/localtime HostPathType: default-token-sbhz4: Type: Secret (a volume populated by a Secret) SecretName: default-token-sbhz4 Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 4m (x3 over 4m) default-scheduler pod has unbound PersistentVolumeClaims Normal Scheduled 4m default-scheduler Successfully assigned onap/dev-appc-appc-db-0 to k8s-dev Normal Pulling 4m kubelet, k8s-dev pulling image "nexus3.onap.org:10001/busybox" Normal Pulled 4m kubelet, k8s-dev Successfully pulled image "nexus3.onap.org:10001/busybox" Normal Created 4m kubelet, k8s-dev Created container Normal Started 4m kubelet, k8s-dev Started container Normal Pulled 3m (x4 over 4m) kubelet, k8s-dev Container image "nexus3.onap.org:10001/adfinissygroup/k8s-mariadb-galera-centos:v002" already present on machine Normal Created 3m (x4 over 4m) kubelet, k8s-dev Created container Normal Started 3m (x4 over 4m) kubelet, k8s-dev Started container Warning BackOff 3m (x9 over 4m) kubelet, k8s-dev Back-off restarting failed container And, logs from "appc-db" container: + CONTAINER_SCRIPTS_DIR=/usr/share/container-scripts/mysql + EXTRA_DEFAULTS_FILE=/etc/my.cnf.d/galera.cnf + '[' -z onap ']' + echo 'Galera: Finding peers' Galera: Finding peers ++ hostname -f ++ cut -d. -f2 + K8S_SVC_NAME=appc-dbhost + echo 'Using service name: appc-dbhost' + cp /usr/share/container-scripts/mysql/galera.cnf /etc/my.cnf.d/galera.cnf Using service name: appc-dbhost + /usr/bin/peer-finder -on-start=/usr/share/container-scripts/mysql/configure-galera.sh -service=appc-dbhost 2019/03/27 15:27:45 Peer list updated was [] now [dev-appc-appc-db-0.appc-dbhost.onap.svc.cluster.local] 2019/03/27 15:27:45 execing: /usr/share/container-scripts/mysql/configure-galera.sh with stdin: dev-appc-appc-db-0.appc-dbhost.onap.svc.cluster.local 2019/03/27 15:27:45 2019/03/27 15:27:46 Peer finder exiting + '[' '!' -d /var/lib/mysql/mysql ']' + exec mysqld 2019-03-27 15:27:46 139879155882240 [Note] mysqld (mysqld 10.1.24-MariaDB) starting as process 1 ... 2019-03-27 15:27:47 139879155882240 [Note] WSREP: Read nil XID from storage engines, skipping position init 2019-03-27 15:27:47 139879155882240 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' 2019-03-27 15:27:47 139879155882240 [Note] WSREP: wsrep_load(): Galera 25.3.20(r3703) by Codership Oy <[email protected]> loaded successfully. 2019-03-27 15:27:47 139879155882240 [Note] WSREP: CRC-32C: using hardware acceleration. 2019-03-27 15:27:47 139879155882240 [Note] WSREP: Found saved state: 84b0f5c0-12b6-11e9-a817-1b6ad3281ac6:-1, safe_to_bootsrap: 0 2019-03-27 15:27:47 139879155882240 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = dev-appc-appc-db-0.appc-dbhost.onap.svc.cluster.local; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_ 2019-03-27 15:27:47 139879155882240 [Note] WSREP: GCache history reset: old(84b0f5c0-12b6-11e9-a817-1b6ad3281ac6:0) -> new(84b0f5c0-12b6-11e9-a817-1b6ad3281ac6:-1) 2019-03-27 15:27:47 139879155882240 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1 2019-03-27 15:27:47 139879155882240 [Note] WSREP: wsrep_sst_grab() 2019-03-27 15:27:47 139879155882240 [Note] WSREP: Start replication 2019-03-27 15:27:47 139879155882240 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1 2019-03-27 15:27:47 139879155882240 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 . 2019-03-27 15:27:47 139879155882240 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7 2019-03-27 15:27:47 139879155882240 [ERROR] Aborting Seems it is related to "safe_to_bootstrap" feature (some info here ( http://galeracluster.com/2016/11/introducing-the-safe-to-bootstrap-feature-in-galera-cluster/ ) ). Not sure how to change this parameter and why now deploying APPC has this issue (I am using same images, same environment, everything; just deploy/undeploy). Would appreciate some help! Kind regards, Xoan -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16372): https://lists.onap.org/g/onap-discuss/message/16372 Mute This Topic: https://lists.onap.org/mt/30794686/21656 Mute #appc: https://lists.onap.org/mk?hashtag=appc&subid=2740164 Group Owner: [email protected] Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
