Mariadb galera pods are failing with connection timeout errors and goes into CrashLoopBackoff repeatedly.
Interestingly, If *data0, data1 and data2* directories are deleted from */dockerdata-nfs/rel-mariadb-galera* folder, Mariadb cluster pods come up successfully. But after few hours goes again into error state. *SO* pods are also not coming up due to mariadb cluster error state. See below logs from one of the pod. Similar errors are present in other pod logs + CONTAINER_SCRIPTS_DIR=/usr/share/container-scripts/mysql + EXTRA_DEFAULTS_FILE=/etc/my.cnf.d/galera.cnf + '[' -z onap ']' + echo 'Galera: Finding peers' Galera: Finding peers ++ hostname -f ++ cut -d. -f2 + K8S_SVC_NAME=mariadb-galera + echo 'Using service name: mariadb-galera' + cp /usr/share/container-scripts/mysql/galera.cnf /etc/my.cnf.d/galera.cnf Using service name: mariadb-galera + /usr/bin/peer-finder -on-start=/usr/share/container-scripts/mysql/configure-galera.sh -service=mariadb-galera 2019/07/17 03:35:22 Peer list updated was [] now [dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local dev-mariadb-galera-mariadb-galera-1.mariadb-galera.onap.svc.cluster.local dev-mariadb-galera-mariadb-galera-2.mariadb-galera.onap.svc.cluster.local] 2019/07/17 03:35:22 execing: /usr/share/container-scripts/mysql/configure-galera.sh with stdin: dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local dev-mariadb-galera-mariadb-galera-1.mariadb-galera.onap.svc.cluster.local dev-mariadb-galera-mariadb-galera-2.mariadb-galera.onap.svc.cluster.local 2019/07/17 03:35:22 2019/07/17 03:35:23 Peer finder exiting + '[' '!' -d / var /lib/mysql/mysql ']' + exec mysqld 2019-07-17 3:35:23 140449607362816 [Note] mysqld (mysqld 10.1.24-MariaDB) starting as process 1 ... 2019-07-17 3:35:23 140449607362816 [Note] WSREP: Read nil XID from storage engines, skipping position init 2019-07-17 3:35:23 140449607362816 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' 2019-07-17 3:35:23 140449607362816 [Note] WSREP: wsrep_load(): Galera 25.3.20(r3703) by Codership Oy <[email protected]> loaded successfully. 2019-07-17 3:35:23 140449607362816 [Note] WSREP: CRC-32C: using hardware acceleration. 2019-07-17 3:35:23 140449607362816 [Note] WSREP: Found saved state: 5b27b8a6-a77d-11e9-a00b-26960ebe383d:-1, safe_to_bootsrap: 0 2019-07-17 3:35:23 140449607362816 [Note] WSREP: Passing config to GCS: base_dir = / var /lib/mysql/; base_host = dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = / var /lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = / var /lib/mysql //galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.versi 2019-07-17 3:35:23 140449607362816 [Note] WSREP: GCache history reset: old(5b27b8a6-a77d-11e9-a00b-26960ebe383d:0) -> new (5b27b8a6-a77d-11e9-a00b-26960ebe383d:-1) 2019-07-17 3:35:23 140449607362816 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1 2019-07-17 3:35:23 140449607362816 [Note] WSREP: wsrep_sst_grab() 2019-07-17 3:35:23 140449607362816 [Note] WSREP: Start replication 2019-07-17 3:35:23 140449607362816 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1 2019-07-17 3:35:23 140449607362816 [Note] WSREP: protonet asio version 0 2019-07-17 3:35:23 140449607362816 [Note] WSREP: Using CRC-32C for message checksums. 2019-07-17 3:35:23 140449607362816 [Note] WSREP: backend: asio 2019-07-17 3:35:23 140449607362816 [Note] WSREP: gcomm thread scheduling priority set to other:0 2019-07-17 3:35:23 140449607362816 [Warning] WSREP: access file(/ var /lib/mysql //gvwstate.dat) failed(No such file or directory) 2019-07-17 3:35:23 140449607362816 [Note] WSREP: restore pc from disk failed 2019-07-17 3:35:23 140449607362816 [Note] WSREP: GMCast version 0 2019-07-17 3:35:23 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) listening at tcp://0.0.0.0:4567 2019-07-17 3:35:23 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) multicast: , ttl: 1 2019-07-17 3:35:23 140449607362816 [Note] WSREP: EVS version 0 2019-07-17 3:35:23 140449607362816 [Note] WSREP: gcomm: connecting to group 'mariadb-galera' , peer 'dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local:,dev-mariadb-galera-mariadb-galera-1.mariadb-galera.onap.svc.cluster.local:,dev-mariadb-galera-mariadb-galera-2.mariadb-galera.onap.svc.cluster.local:' 2019-07-17 3:35:23 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) connection established to e8fea524 tcp://10.42.6.57:4567 2019-07-17 3:35:23 140449607362816 [Warning] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) address 'tcp://10.42.6.57:4567' points to own listening address, blacklisting 2019-07-17 3:35:23 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) connection established to d87d55d6 tcp://10.42.5.67:4567 2019-07-17 3:35:23 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) turning message relay requesting on, nonlive peers: 2019-07-17 3:35:24 140449607362816 [Note] WSREP: declaring d87d55d6 at tcp: //10.42.5.67:4567 stable 2019-07-17 3:35:24 140449607362816 [Warning] WSREP: no nodes coming from prim view, prim not possible 2019-07-17 3:35:24 140449607362816 [Note] WSREP: view(view_id(NON_PRIM,d87d55d6,2) memb { d87d55d6,0 e8fea524,0 } joined { } left { } partitioned { }) 2019-07-17 3:35:26 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) connection to peer e8fea524 with addr tcp://10.42.6.57:4567 timed out, no messages seen in PT3S 2019-07-17 3:35:27 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) turning message relay requesting off 2019-07-17 3:35:29 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) turning message relay requesting on, nonlive peers: tcp://10.42.5.67:4567 2019-07-17 3:35:30 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) reconnecting to d87d55d6 (tcp://10.42.5.67:4567), attempt 0 2019-07-17 3:35:33 140449607362816 [Note] WSREP: evs::proto(e8fea524, OPERATIONAL, view_id(REG,d87d55d6,2)) suspecting node: d87d55d6 2019-07-17 3:35:33 140449607362816 [Note] WSREP: evs::proto(e8fea524, OPERATIONAL, view_id(REG,d87d55d6,2)) suspected node without join message, declaring inactive 2019-07-17 3:35:34 140449607362816 [Note] WSREP: view(view_id(NON_PRIM,d87d55d6,2) memb { e8fea524,0 } joined { } left { } partitioned { d87d55d6,0 }) 2019-07-17 3:35:34 140449607362816 [Warning] WSREP: no nodes coming from prim view, prim not possible 2019-07-17 3:35:34 140449607362816 [Note] WSREP: view(view_id(NON_PRIM,e8fea524,3) memb { e8fea524,0 } joined { } left { } partitioned { d87d55d6,0 }) 2019-07-17 3:35:34 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) connection established to ef8a1c31 tcp://10.42.5.67:4567 2019-07-17 3:35:34 140449607362816 [Note] WSREP: remote endpoint tcp: //10.42.5.67:4567 changed identity d87d55d6 -> ef8a1c31 2019-07-17 3:35:35 140449607362816 [Note] WSREP: declaring ef8a1c31 at tcp: //10.42.5.67:4567 stable 2019-07-17 3:35:35 140449607362816 [Warning] WSREP: no nodes coming from prim view, prim not possible 2019-07-17 3:35:35 140449607362816 [Note] WSREP: view(view_id(NON_PRIM,e8fea524,4) memb { e8fea524,0 ef8a1c31,0 } joined { } left { } partitioned { d87d55d6,0 }) 2019-07-17 3:35:37 140449607362816 [Note] WSREP: (e8fea524, 'tcp: //0.0.0.0:4567' ) turning message relay requesting off 2019-07-17 3:35:54 140449607362816 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():158 2019-07-17 3:35:54 140449607362816 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out) 2019-07-17 3:35:54 140449607362816 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1404: Failed to open channel 'mariadb-galera' at 'gcomm: //dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local,dev-mariadb-galera-mariadb-galera-1.mariadb-galera.onap.svc.cluster.local,dev-mariadb-galera-mariadb-galera-2.mariadb-galera.onap.svc.cluster.local' : -110 (Connection timed out) 2019-07-17 3:35:54 140449607362816 [ERROR] WSREP: gcs connect failed: Connection timed out 2019-07-17 3:35:54 140449607362816 [ERROR] WSREP: wsrep::connect(gcomm: //dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local,dev-mariadb-galera-mariadb-galera-1.mariadb-galera.onap.svc.cluster.local,dev-mariadb-galera-mariadb-galera-2.mariadb-galera.onap.svc.cluster.local) failed: 7 2019-07-17 3:35:54 140449607362816 [ERROR] Aborting -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18162): https://lists.onap.org/g/onap-discuss/message/18162 Mute This Topic: https://lists.onap.org/mt/32512684/21656 Mute #so: https://lists.onap.org/mk?hashtag=so&subid=2740164 Mute #appc: https://lists.onap.org/mk?hashtag=appc&subid=2740164 Group Owner: [email protected] Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
