Re: Data inconsistency after restart

Andrew Wong Tue, 05 Dec 2017 16:32:40 -0800

>
> How did you verify that all the data was inserted and how did you find
> some data missing? I'm wondering if it's possible that the initial
> "missing" data was data that Kudu was still in the process of inserting
> (albeit slowly, due to memory backpressure or somesuch).
>


Following up on what I posted, take a look at
https://kudu.apache.org/docs/transaction_semantics.html#_read_operations_scans.
It seems definitely possible that not all of the rows had finished
inserting when counting, or that the scans were sent to a stale replica.

On Tue, Dec 5, 2017 at 4:18 PM, Andrew Wong <[email protected]> wrote:

> Hi Petter,
>
> When we verified that all data was inserted we found that some data was
>> missing. We added this missing data and on some chunks we got the
>> information that all rows were already present, i.e impala says something
>> like Modified: 0 rows, nnnnnnn errors. Doing the verification again now
>> shows that the Kudu table is complete. So, even though we did not insert
>> any data on some chunks, a count(*) operation over these chunks now returns
>> a different value.
>
>
> How did you verify that all the data was inserted and how did you find
> some data missing? I'm wondering if it's possible that the initial
> "missing" data was data that Kudu was still in the process of inserting
> (albeit slowly, due to memory backpressure or somesuch).
>
> Now to my question. Will data be inconsistent if we recycle Kudu after
>> seeing soft memory limit warnings?
>
>
> Your data should be consistently written, even with those warnings. AFAIK
> they would cause a bit of slowness, not incorrect results.
>
> Is there a way to tell when it is safe to restart Kudu to avoid these
>> issues? Should we use any special procedure when restarting (e.g. only
>> restart the tablet servers, only restart one tablet server at a time or
>> something like that)?
>
>
> In general, you can use the `ksck` tool to check the health of your
> cluster. See https://kudu.apache.org/docs/command_line_tools_
> reference.html#cluster-ksck for more details. For restarting a cluster, I
> would recommend taking down all tablet servers at once, otherwise tablet
> replicas may try to replicate data from the server that was taken down.
>
> Hope this helped,
> Andrew
>
> On Tue, Dec 5, 2017 at 10:42 AM, Petter von Dolwitz (Hem) <
> [email protected]> wrote:
>
>> Hi Kudu users,
>>
>> We just started to use Kudu (1.4.0+cdh5.12.1). To make a baseline for
>> evaluation we ingested 3 month worth of data. During ingestion we were
>> facing messages from the maintenance threads that a soft memory limit were
>> reached. It seems like the background maintenance threads stopped
>> performing their tasks at this point in time. It also so seems like the
>> memory was never recovered even after stopping ingestion so I guess there
>> was a large backlog being built up. I guess the root cause here is that we
>> were a bit too conservative when giving Kudu memory. After a reststart a
>> lot of maintenance tasks were started (i.e. compaction).
>>
>> When we verified that all data was inserted we found that some data was
>> missing. We added this missing data and on some chunks we got the
>> information that all rows were already present, i.e impala says something
>> like Modified: 0 rows, nnnnnnn errors. Doing the verification again now
>> shows that the Kudu table is complete. So, even though we did not insert
>> any data on some chunks, a count(*) operation over these chunks now returns
>> a different value.
>>
>> Now to my question. Will data be inconsistent if we recycle Kudu after
>> seeing soft memory limit warnings?
>>
>> Is there a way to tell when it is safe to restart Kudu to avoid these
>> issues? Should we use any special procedure when restarting (e.g. only
>> restart the tablet servers, only restart one tablet server at a time or
>> something like that)?
>>
>> The table design uses 50 tablets per day (times 90 days). It is 8 TB of
>> data after 3xreplication over 5 tablet servers.
>>
>> Thanks,
>> Petter
>>
>>
>>
>
>
> --
> Andrew Wong
>



-- 
Andrew Wong

Re: Data inconsistency after restart

Reply via email to