Hi

We're running version: kudu 1.5.0-cdh5.13.0

We had another incident today due to memory runnig out and kudu is now
coming back up slowly. I took a screenshot of kudu tablet server ui and
would like to know what actual happens here? I can see more tablets slowly
getting to "RUNNING"-state.


​

2017-11-01 22:01 GMT+01:00 Todd Lipcon <t...@cloudera.com>:

> Hi Janne,
>
> It's not clear whether the issue was that it was taking a long time to
> restart (i.e replaying WALs) or if somehow you also ended up having to
> re-replicate a bunch of tablets from host to host in the cluster. There
> were some bugs in earlier versions of Kudu (eg KUDU-2125, KUDU-2020) which
> could make this process rather slow to stabilize.
>
> If this issue happens again, running 'kudu cluster ksck' during the
> instable period can often yield more information to help understand what is
> happening.
>
> What version are you running?
>
> Todd
>
>
> On Wed, Nov 1, 2017 at 1:16 AM, Janne Keskitalo <janne.keskit...@paf.com>
> wrote:
>
>> Hi
>>
>> Our Kudu test environment got unresponsive yesterday for unknown reason.
>> It has three tablet servers and one master. It's running in AWS on quite
>> small host machines, so maybe some node ran out of memory or something. It
>> has happened before with this setup. Anyway, after we restarted kudu
>> service, we couldn't do any selects. From the tablet server UI I could see
>> it was initializing and bootstrapping tablets. It took many hours until all
>> tablets were in RUNNING-state.
>>
>> My question is where can I find information about these background
>> operations? I want to understand what happens in situations when some node
>> is offline and then comes back up after a while. What is tablet
>> initialization and bootstrapping, etc.
>>
>> --
>> Br.
>> Janne Keskitalo,
>> Database Architect, PAF.COM
>> For support: dbdsupp...@paf.com
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Br.
Janne Keskitalo,

Reply via email to