Sorry, I meant taking a 1.6 Kudu tool and running it against your 1.5 cluster.
-Will On Mon, Aug 20, 2018 at 11:19 AM William Berkeley <[email protected]> wrote: > That looks like KUDU-2113, which was fixed in 1.6.0. > > It happens if the tablet servers report peers in their config that are not > known to the master. Probably, you have removed servers from the cluster > and some of the tablets are in a bad state as a result. These sorts of > problems were unfortunately common on earlier Kudu releases. Every new > version since 5.12 had made significant improvements to prevent these sorts > of situations. I'd recommend upgrading to 1.5, or at least taking a 1.5 > kudu tool and running it against the 1.4 cluster to see what the issues are. > > -Will > > On Mon, Aug 20, 2018 at 10:57 AM, Vincent Kooijman < > [email protected]> wrote: > >> Hi all, >> >> >> >> We're running into a few Kudu issues with the first being the Kudu >> cluster check utility (sudo -u kudu >> /opt/cloudera/parcels/CDH/lib/kudu/bin-debug/kudu cluster ksck) showing: >> >> >> >> Connected to the Master >> >> Fetched info from all 10 Tablet Servers >> >> >> >> Tablet 41bf41e4127a46c69242f707298cf4ba of table 'xxx' is >> under-replicated: 1 replica(s) not RUNNING >> >> 1b3d49dd6ce64acda32f97a89d7de193: TS unavailable >> >> 1a05af887edf4ba7b5c1731ce3508b19 (pdn05:7050): RUNNING [LEADER] >> >> 4028533287964369928034c3616a0a16 (pdn01:7050): RUNNING >> >> >> >> 2 replicas' active configs differ from the master's. >> >> All the peers reported by the master and tablet servers are: >> >> A = 1a05af887edf4ba7b5c1731ce3508b19 >> >> B = 1b3d49dd6ce64acda32f97a89d7de193 >> >> C = 4028533287964369928034c3616a0a16 >> >> >> >> *The consensus matrix is:* >> >> *Segmentation fault* >> >> >> >> There is some mention of segmentation fault in combination with ksck in >> the Kudu release notes for 1.4.0, but we are running 1.5.0 on a CDH cluster. >> >> >> >> Some notes: >> >> >> >> - All masters (we have 3) are up with one leader being elected >> - All tablet servers (10) are live and visible in the master web UI >> - We've ran kudu fs check ... -repair on all servers (master & tablet) >> - Master logs are filled with errors like: >> >> Previously reported cstate for tablet >> 5977f01cea44448a908bb56f97b46d9e (table 'xxx' >> [id=bb359f4b89dd46e797e2e24f9efac971]) gave a different leader for term >> 2007 than the current cstate. Previous cstate: current_term: 2007 >> leader_uuid: "" >> >> - And tablet server logs contain a lot of: >> >> Couldn't send request to peer 228515616baf44a99561c2b72dfb3bab for >> tablet 138854a04f804f4ebf42df657c22b995. Error code: TABLET_NOT_RUNNING >> (12). Status: Illegal state: Tablet not RUNNING: INITIALIZED. Retrying in >> the next heartbeat period. Already tried 12813 times. >> >> >> >> We're a bit lost as to where to look next. >> >> >> >> If anyone can point us in the right direction, that would be great! >> >> >> Thanks, >> >> >> >> Vincent >> > >
