Hi We're running version: kudu 1.5.0-cdh5.13.0
We had another incident today due to memory runnig out and kudu is now coming back up slowly. I took a screenshot of kudu tablet server ui and would like to know what actual happens here? I can see more tablets slowly getting to "RUNNING"-state. 2017-11-01 22:01 GMT+01:00 Todd Lipcon <t...@cloudera.com>: > Hi Janne, > > It's not clear whether the issue was that it was taking a long time to > restart (i.e replaying WALs) or if somehow you also ended up having to > re-replicate a bunch of tablets from host to host in the cluster. There > were some bugs in earlier versions of Kudu (eg KUDU-2125, KUDU-2020) which > could make this process rather slow to stabilize. > > If this issue happens again, running 'kudu cluster ksck' during the > instable period can often yield more information to help understand what is > happening. > > What version are you running? > > Todd > > > On Wed, Nov 1, 2017 at 1:16 AM, Janne Keskitalo <janne.keskit...@paf.com> > wrote: > >> Hi >> >> Our Kudu test environment got unresponsive yesterday for unknown reason. >> It has three tablet servers and one master. It's running in AWS on quite >> small host machines, so maybe some node ran out of memory or something. It >> has happened before with this setup. Anyway, after we restarted kudu >> service, we couldn't do any selects. From the tablet server UI I could see >> it was initializing and bootstrapping tablets. It took many hours until all >> tablets were in RUNNING-state. >> >> My question is where can I find information about these background >> operations? I want to understand what happens in situations when some node >> is offline and then comes back up after a while. What is tablet >> initialization and bootstrapping, etc. >> >> -- >> Br. >> Janne Keskitalo, >> Database Architect, PAF.COM >> For support: dbdsupp...@paf.com >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Br. Janne Keskitalo,