Mariadb galera pods are failing with connection timeout errors and goes into 
CrashLoopBackoff repeatedly.

Interestingly, If *data0, data1 and data2* directories are deleted from 
*/dockerdata-nfs/rel-mariadb-galera* folder, Mariadb cluster pods come up 
successfully. But after few hours goes again into error state.

*SO* pods are also not coming up due to mariadb cluster error state.

See below logs from one of the pod. Similar errors are present in other pod logs

+ CONTAINER_SCRIPTS_DIR=/usr/share/container-scripts/mysql
+ EXTRA_DEFAULTS_FILE=/etc/my.cnf.d/galera.cnf
+ '[' -z onap ']'
+ echo 'Galera: Finding peers'
Galera: Finding peers
++ hostname -f
++ cut -d. -f2
+ K8S_SVC_NAME=mariadb-galera
+ echo 'Using service name: mariadb-galera'
+ cp /usr/share/container-scripts/mysql/galera.cnf /etc/my.cnf.d/galera.cnf
Using service name: mariadb-galera
+ /usr/bin/peer-finder 
-on-start=/usr/share/container-scripts/mysql/configure-galera.sh 
-service=mariadb-galera
2019/07/17 03:35:22 Peer list updated
was []
now [dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local 
dev-mariadb-galera-mariadb-galera-1.mariadb-galera.onap.svc.cluster.local 
dev-mariadb-galera-mariadb-galera-2.mariadb-galera.onap.svc.cluster.local]
2019/07/17 03:35:22 execing: 
/usr/share/container-scripts/mysql/configure-galera.sh with stdin: 
dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local
dev-mariadb-galera-mariadb-galera-1.mariadb-galera.onap.svc.cluster.local
dev-mariadb-galera-mariadb-galera-2.mariadb-galera.onap.svc.cluster.local
2019/07/17 03:35:22 
2019/07/17 03:35:23 Peer finder exiting
+ '[' '!' -d / var /lib/mysql/mysql ']'
+ exec mysqld
2019-07-17  3:35:23 140449607362816 [Note] mysqld (mysqld 10.1.24-MariaDB) 
starting as process 1 ...
2019-07-17  3:35:23 140449607362816 [Note] WSREP: Read nil XID from storage 
engines, skipping position init
2019-07-17  3:35:23 140449607362816 [Note] WSREP: wsrep_load(): loading 
provider library '/usr/lib64/galera/libgalera_smm.so'
2019-07-17  3:35:23 140449607362816 [Note] WSREP: wsrep_load(): Galera 
25.3.20(r3703) by Codership Oy <[email protected]> loaded successfully.
2019-07-17  3:35:23 140449607362816 [Note] WSREP: CRC-32C: using hardware 
acceleration.
2019-07-17  3:35:23 140449607362816 [Note] WSREP: Found saved state: 
5b27b8a6-a77d-11e9-a00b-26960ebe383d:-1, safe_to_bootsrap: 0
2019-07-17  3:35:23 140449607362816 [Note] WSREP: Passing config to GCS: 
base_dir = / var /lib/mysql/; base_host = 
dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local; 
base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; 
evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; 
evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; 
evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 
4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; 
evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = / var 
/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = / 
var /lib/mysql //galera.cache; gcache.page_size = 128M; gcache.recover = no; 
gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 
1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; 
gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; 
gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; 
gmcast.versi
2019-07-17  3:35:23 140449607362816 [Note] WSREP: GCache history reset: 
old(5b27b8a6-a77d-11e9-a00b-26960ebe383d:0) -> new 
(5b27b8a6-a77d-11e9-a00b-26960ebe383d:-1)
2019-07-17  3:35:23 140449607362816 [Note] WSREP: Assign initial position for 
certification: -1, protocol version: -1
2019-07-17  3:35:23 140449607362816 [Note] WSREP: wsrep_sst_grab()
2019-07-17  3:35:23 140449607362816 [Note] WSREP: Start replication
2019-07-17  3:35:23 140449607362816 [Note] WSREP: Setting initial position to 
00000000-0000-0000-0000-000000000000:-1
2019-07-17  3:35:23 140449607362816 [Note] WSREP: protonet asio version 0
2019-07-17  3:35:23 140449607362816 [Note] WSREP: Using CRC-32C for message 
checksums.
2019-07-17  3:35:23 140449607362816 [Note] WSREP: backend: asio
2019-07-17  3:35:23 140449607362816 [Note] WSREP: gcomm thread scheduling 
priority set to other:0 
2019-07-17  3:35:23 140449607362816 [Warning] WSREP: access file(/ var 
/lib/mysql //gvwstate.dat) failed(No such file or directory)
2019-07-17  3:35:23 140449607362816 [Note] WSREP: restore pc from disk failed
2019-07-17  3:35:23 140449607362816 [Note] WSREP: GMCast version 0
2019-07-17  3:35:23 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) listening at tcp://0.0.0.0:4567
2019-07-17  3:35:23 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) multicast: , ttl: 1
2019-07-17  3:35:23 140449607362816 [Note] WSREP: EVS version 0
2019-07-17  3:35:23 140449607362816 [Note] WSREP: gcomm: connecting to group 
'mariadb-galera' , peer 
'dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local:,dev-mariadb-galera-mariadb-galera-1.mariadb-galera.onap.svc.cluster.local:,dev-mariadb-galera-mariadb-galera-2.mariadb-galera.onap.svc.cluster.local:'
2019-07-17  3:35:23 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) connection established to e8fea524 tcp://10.42.6.57:4567
2019-07-17  3:35:23 140449607362816 [Warning] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) address 'tcp://10.42.6.57:4567' points to own listening 
address, blacklisting
2019-07-17  3:35:23 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) connection established to d87d55d6 tcp://10.42.5.67:4567
2019-07-17  3:35:23 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) turning message relay requesting on, nonlive peers: 
2019-07-17  3:35:24 140449607362816 [Note] WSREP: declaring d87d55d6 at tcp: 
//10.42.5.67:4567 stable
2019-07-17  3:35:24 140449607362816 [Warning] WSREP: no nodes coming from prim 
view, prim not possible
2019-07-17  3:35:24 140449607362816 [Note] WSREP: 
view(view_id(NON_PRIM,d87d55d6,2) memb {
        d87d55d6,0
        e8fea524,0
} joined {
} left {
} partitioned {
})
2019-07-17  3:35:26 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) connection to peer e8fea524 with addr tcp://10.42.6.57:4567 
timed out, no messages seen in PT3S
2019-07-17  3:35:27 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) turning message relay requesting off
2019-07-17  3:35:29 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) turning message relay requesting on, nonlive peers: 
tcp://10.42.5.67:4567 
2019-07-17  3:35:30 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) reconnecting to d87d55d6 (tcp://10.42.5.67:4567), attempt 0
2019-07-17  3:35:33 140449607362816 [Note] WSREP: evs::proto(e8fea524, 
OPERATIONAL, view_id(REG,d87d55d6,2)) suspecting node: d87d55d6
2019-07-17  3:35:33 140449607362816 [Note] WSREP: evs::proto(e8fea524, 
OPERATIONAL, view_id(REG,d87d55d6,2)) suspected node without join message, 
declaring inactive
2019-07-17  3:35:34 140449607362816 [Note] WSREP: 
view(view_id(NON_PRIM,d87d55d6,2) memb {
        e8fea524,0
} joined {
} left {
} partitioned {
        d87d55d6,0
})
2019-07-17  3:35:34 140449607362816 [Warning] WSREP: no nodes coming from prim 
view, prim not possible
2019-07-17  3:35:34 140449607362816 [Note] WSREP: 
view(view_id(NON_PRIM,e8fea524,3) memb {
        e8fea524,0
} joined {
} left {
} partitioned {
        d87d55d6,0
})
2019-07-17  3:35:34 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) connection established to ef8a1c31 tcp://10.42.5.67:4567
2019-07-17  3:35:34 140449607362816 [Note] WSREP: remote endpoint tcp: 
//10.42.5.67:4567 changed identity d87d55d6 -> ef8a1c31
2019-07-17  3:35:35 140449607362816 [Note] WSREP: declaring ef8a1c31 at tcp: 
//10.42.5.67:4567 stable
2019-07-17  3:35:35 140449607362816 [Warning] WSREP: no nodes coming from prim 
view, prim not possible
2019-07-17  3:35:35 140449607362816 [Note] WSREP: 
view(view_id(NON_PRIM,e8fea524,4) memb {
        e8fea524,0
        ef8a1c31,0
} joined {
} left {
} partitioned {
        d87d55d6,0
})
2019-07-17  3:35:37 140449607362816 [Note] WSREP: (e8fea524, 'tcp: 
//0.0.0.0:4567' ) turning message relay requesting off
2019-07-17  3:35:54 140449607362816 [ERROR] WSREP: failed to open gcomm backend 
connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():158
2019-07-17  3:35:54 140449607362816 [ERROR] WSREP: 
gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: 
-110 (Connection timed out)
2019-07-17  3:35:54 140449607362816 [ERROR] WSREP: 
gcs/src/gcs.cpp:gcs_open():1404: Failed to open channel 'mariadb-galera' at 
'gcomm: 
//dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local,dev-mariadb-galera-mariadb-galera-1.mariadb-galera.onap.svc.cluster.local,dev-mariadb-galera-mariadb-galera-2.mariadb-galera.onap.svc.cluster.local'
 : -110 (Connection timed out)
2019-07-17  3:35:54 140449607362816 [ERROR] WSREP: gcs connect failed: 
Connection timed out
2019-07-17  3:35:54 140449607362816 [ERROR] WSREP: wsrep::connect(gcomm: 
//dev-mariadb-galera-mariadb-galera-0.mariadb-galera.onap.svc.cluster.local,dev-mariadb-galera-mariadb-galera-1.mariadb-galera.onap.svc.cluster.local,dev-mariadb-galera-mariadb-galera-2.mariadb-galera.onap.svc.cluster.local)
 failed: 7
2019-07-17  3:35:54 140449607362816 [ERROR] Aborting

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#18162): https://lists.onap.org/g/onap-discuss/message/18162
Mute This Topic: https://lists.onap.org/mt/32512684/21656
Mute #so: https://lists.onap.org/mk?hashtag=so&subid=2740164
Mute #appc: https://lists.onap.org/mk?hashtag=appc&subid=2740164
Group Owner: [email protected]
Unsubscribe: https://lists.onap.org/g/onap-discuss/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to