Hi all, We're running into a few Kudu issues with the first being the Kudu cluster check utility (sudo -u kudu /opt/cloudera/parcels/CDH/lib/kudu/bin-debug/kudu cluster ksck) showing:
Connected to the Master Fetched info from all 10 Tablet Servers Tablet 41bf41e4127a46c69242f707298cf4ba of table 'xxx' is under-replicated: 1 replica(s) not RUNNING 1b3d49dd6ce64acda32f97a89d7de193: TS unavailable 1a05af887edf4ba7b5c1731ce3508b19 (pdn05:7050): RUNNING [LEADER] 4028533287964369928034c3616a0a16 (pdn01:7050): RUNNING 2 replicas' active configs differ from the master's. All the peers reported by the master and tablet servers are: A = 1a05af887edf4ba7b5c1731ce3508b19 B = 1b3d49dd6ce64acda32f97a89d7de193 C = 4028533287964369928034c3616a0a16 The consensus matrix is: Segmentation fault There is some mention of segmentation fault in combination with ksck in the Kudu release notes for 1.4.0, but we are running 1.5.0 on a CDH cluster. Some notes: * All masters (we have 3) are up with one leader being elected * All tablet servers (10) are live and visible in the master web UI * We've ran kudu fs check ... -repair on all servers (master & tablet) * Master logs are filled with errors like: Previously reported cstate for tablet 5977f01cea44448a908bb56f97b46d9e (table 'xxx' [id=bb359f4b89dd46e797e2e24f9efac971]) gave a different leader for term 2007 than the current cstate. Previous cstate: current_term: 2007 leader_uuid: "" * And tablet server logs contain a lot of: Couldn't send request to peer 228515616baf44a99561c2b72dfb3bab for tablet 138854a04f804f4ebf42df657c22b995. Error code: TABLET_NOT_RUNNING (12). Status: Illegal state: Tablet not RUNNING: INITIALIZED. Retrying in the next heartbeat period. Already tried 12813 times. We're a bit lost as to where to look next. If anyone can point us in the right direction, that would be great! Thanks, Vincent
