Re: Data inconsistency after restart

Petter von Dolwitz (Hem) Mon, 08 Jan 2018 01:34:48 -0800

Hi Todd,

I have not set aside time to re-create the issue again. I still aim to do
this. When I do, I will collect all relevant logs and create a JIRA.


Thank you for following up.

Br,
Petter

2018-01-05 0:35 GMT+01:00 Todd Lipcon <t...@cloudera.com>:

> Hey Petter,
>
> Did you ever get to the bottom of this? We definitely don't expect Kudu to
> lose data on a restart (and we have hundreds of tests running continuously
> which try to ensure this)
>
> -Todd
>
> On Fri, Dec 8, 2017 at 10:13 PM, David Alves <davidral...@gmail.com>
> wrote:
>
>> Hi Petter
>>
>>  Don't have answers yet, but I do have some more questions  (inline)
>>
>> Petter von Dolwitz (Hem) writes:
>>
>> Hi David,
>>>
>>> In short to summarize:
>>>
>>> 1. I ingest data. Kudus maintenance threads stops working (soft memory
>>> limit) and incoming data is throttled. There are no errors reported on
>>> the
>>> client side.
>>>
>>  What is the "client side"? impala? spark? java/c++.
>>
>> 2. I stop ingestion and wait until i *think* Kudu is finsished.
>>>
>>  The question above is pertinent. Impala will not return until a  query
>>  is fully successful, though it may return an error and leave a  query
>>  only half-way executed. If you're using the client apis directly
>>  though are you checking for error when inserting?
>>
>> 3. I restart Kudu.
>>> 4. I validate the inserted data by doing count(*) on groups of data in
>>> Kudu. For several groups, Kudu reports a lot of rows missing.
>>>
>>  Kudu's default scan mode is READ_LATEST. While this is the most
>>  performance oriented mode, its also the one with the least  guarantees
>>  so, on startup its possible that it reads from a stale replica,  giving
>>  the _appearance_ that rows went missing. Things to try here:
>>  - Try the same query a few minutes later. Is the answer  different?
>>  - If the above is true consider changing your scan mode to
>>  READ_AT_SNAPHOT. In this mode data is guaranteed not to be  state,
>>  though you might have to wait for all replicas to be ready
>>
>> 5. I ingest the same data again. Client reports that all row are already
>>> present.
>>>
>>  This isn't surprising _if_ the problem is indeed from state  replicas.
>>
>>> 6. Doing the count(*) exercise again now gives me the correct number of
>>> rows.
>>>
>>> This tells me that the data was ingested into Kudu on the first attempt
>>> but
>>> a scan did not find the data. Inserting the data again made it
>>> visible.
>>>
>>  Can it be that after the scan it's just that enough time has  elapsed
>>  so that all replicas are caught up? I'd say this is likely the  case.
>>
>>>
>>> Br,
>>> Petter
>>>
>>> 2017-12-07 21:39 GMT+01:00 David Alves <davidral...@gmail.com>:
>>>
>>> Hi Petter
>>>>
>>>>    I'd like to clarify what exactly happened and exactly what    are you
>>>> referring to as "inconsistency".
>>>>    From what I understand of the first error you observed, the    Kudu
>>>> was
>>>> underprovisioned, memory wise, and the ingest jobs/queries failed. Is
>>>> that
>>>> right? Since Kudu doesn't have atomic multi-row writes, it's currently
>>>> expected in this case that you'll end up with partially written data.
>>>>    If you tried the same job again, and it succeeded, for    certain
>>>> types of
>>>> operation (UPSERT, INSERT IGNORE) then the remaining rows would be
>>>> written
>>>> and all the data would be there as expected.
>>>>    I'd like to distinguish this lack of atomicity on multi-row
>>>> transactions from "inconsistency", which is what you might observe if an
>>>> operation didn't fail, but you couldn't see all the data. For this
>>>> latter
>>>> case there are options you can choose to avoid any inconsistency.
>>>>
>>>> Best
>>>> David
>>>>
>>>>
>>>>
>>>> On Wed, Dec 6, 2017 at 4:26 AM, Petter von Dolwitz (Hem) <
>>>> petter.von.dolw...@gmail.com> wrote:
>>>>
>>>> Thanks for your reply Andrew!
>>>>>
>>>>> >How did you verify that all the data was inserted and how did >you
>>>>> find
>>>>> some data missing?
>>>>> This was done using Impala. We counted the rows for groups representing
>>>>> the chunks we inserted.
>>>>>
>>>>> >Following up on what I posted, take a look at
>>>>> https://kudu.apache.org/docs/transaction_semantics.html#_
>>>>> read_operations_scans. It seems definitely possible that not all of the
>>>>> rows had finished inserting when counting, or that the scans were sent
>>>>> to a
>>>>> stale replica.
>>>>> Before we shut down we could only see the following in the logs. I.e.,
>>>>> no
>>>>> sign that ingestion was still ongoing.
>>>>>
>>>>> kudu-tserver.ip-xx-yyy-z-nnn.root.log.INFO.20171201-065232.90314:I1201
>>>>> 07:27:35.010694 90793 maintenance_manager.cc:383] P
>>>>> a38902afefca4a85a5469d149df9b4cb: we have exceeded our soft memory
>>>>> limit
>>>>> (current capacity is 67.52%).  However, there are no ops currently
>>>>> runnable
>>>>> which would free memory.
>>>>>
>>>>> Also the (cloudera) metric total_kudu_rows_inserted_rate_
>>>>> across_kudu_replicas
>>>>> showed zero.
>>>>>
>>>>> Still it seems like some data became inconsistent after restart. But if
>>>>> the maintenance_manager performs important jobs that are required to
>>>>> ensure
>>>>> that all data is inserted then I can understand why we ended up with
>>>>> inconsistent data. But, if I understand you correct,  you are saying
>>>>> that
>>>>> these jobs are not critical for ingestion. In the link you provided I
>>>>> read
>>>>> "Impala scans are currently performed as READ_LATEST and have no
>>>>> consistency guarantees.". I would assume this means that it does not
>>>>> guarantee consistency if new data is inserted but should give valid
>>>>> (and
>>>>> same) results if no new data is inserted?
>>>>>
>>>>> I have not tried the ksck tool yet. Thank you for reminding. I will
>>>>> have
>>>>> a look.
>>>>>
>>>>> Br,
>>>>> Petter
>>>>>
>>>>>
>>>>> 2017-12-06 1:31 GMT+01:00 Andrew Wong <aw...@cloudera.com>:
>>>>>
>>>>> How did you verify that all the data was inserted and how did you find
>>>>>>
>>>>>>> some data missing? I'm wondering if it's possible that the initial
>>>>>>> "missing" data was data that Kudu was still in the process of
>>>>>>> inserting
>>>>>>> (albeit slowly, due to memory backpressure or somesuch).
>>>>>>>
>>>>>>>
>>>>>> Following up on what I posted, take a look at
>>>>>> https://kudu.apache.org/docs/transaction_semantics.html#_
>>>>>> read_operations_scans. It seems definitely possible that not all of
>>>>>> the
>>>>>> rows had finished inserting when counting, or that the scans were
>>>>>> sent to a
>>>>>> stale replica.
>>>>>>
>>>>>> On Tue, Dec 5, 2017 at 4:18 PM, Andrew Wong <aw...@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Petter,
>>>>>>>
>>>>>>> When we verified that all data was inserted we found that some data
>>>>>>> was
>>>>>>>
>>>>>>>> missing. We added this missing data and on some chunks we got the
>>>>>>>> information that all rows were already present, i.e impala says
>>>>>>>> something
>>>>>>>> like Modified: 0 rows, nnnnnnn errors. Doing the verification again
>>>>>>>> now
>>>>>>>> shows that the Kudu table is complete. So, even though we did not
>>>>>>>> insert
>>>>>>>> any data on some chunks, a count(*) operation over these chunks now
>>>>>>>> returns
>>>>>>>> a different value.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> How did you verify that all the data was inserted and how did you
>>>>>>> find
>>>>>>> some data missing? I'm wondering if it's possible that the initial
>>>>>>> "missing" data was data that Kudu was still in the process of
>>>>>>> inserting
>>>>>>> (albeit slowly, due to memory backpressure or somesuch).
>>>>>>>
>>>>>>> Now to my question. Will data be inconsistent if we recycle Kudu
>>>>>>> after
>>>>>>>
>>>>>>>> seeing soft memory limit warnings?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Your data should be consistently written, even with those warnings.
>>>>>>> AFAIK they would cause a bit of slowness, not incorrect results.
>>>>>>>
>>>>>>> Is there a way to tell when it is safe to restart Kudu to avoid these
>>>>>>>
>>>>>>>> issues? Should we use any special procedure when restarting (e.g.
>>>>>>>> only
>>>>>>>> restart the tablet servers, only restart one tablet server at a
>>>>>>>> time or
>>>>>>>> something like that)?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In general, you can use the `ksck` tool to check the health of your
>>>>>>> cluster. See https://kudu.apache.org/docs/c
>>>>>>> ommand_line_tools_referenc
>>>>>>> e.html#cluster-ksck for more details. For restarting a cluster, I
>>>>>>> would recommend taking down all tablet servers at once, otherwise
>>>>>>> tablet
>>>>>>> replicas may try to replicate data from the server that was taken
>>>>>>> down.
>>>>>>>
>>>>>>> Hope this helped,
>>>>>>> Andrew
>>>>>>>
>>>>>>> On Tue, Dec 5, 2017 at 10:42 AM, Petter von Dolwitz (Hem) <
>>>>>>> petter.von.dolw...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Kudu users,
>>>>>>>>
>>>>>>>> We just started to use Kudu (1.4.0+cdh5.12.1). To make a baseline
>>>>>>>> for
>>>>>>>> evaluation we ingested 3 month worth of data. During ingestion we
>>>>>>>> were
>>>>>>>> facing messages from the maintenance threads that a soft memory
>>>>>>>> limit were
>>>>>>>> reached. It seems like the background maintenance threads stopped
>>>>>>>> performing their tasks at this point in time. It also so seems like
>>>>>>>> the
>>>>>>>> memory was never recovered even after stopping ingestion so I guess
>>>>>>>> there
>>>>>>>> was a large backlog being built up. I guess the root cause here is
>>>>>>>> that we
>>>>>>>> were a bit too conservative when giving Kudu memory. After a
>>>>>>>> reststart a
>>>>>>>> lot of maintenance tasks were started (i.e. compaction).
>>>>>>>>
>>>>>>>> When we verified that all data was inserted we found that some data
>>>>>>>> was missing. We added this missing data and on some chunks we got
>>>>>>>> the
>>>>>>>> information that all rows were already present, i.e impala says
>>>>>>>> something
>>>>>>>> like Modified: 0 rows, nnnnnnn errors. Doing the verification again
>>>>>>>> now
>>>>>>>> shows that the Kudu table is complete. So, even though we did not
>>>>>>>> insert
>>>>>>>> any data on some chunks, a count(*) operation over these chunks now
>>>>>>>> returns
>>>>>>>> a different value.
>>>>>>>>
>>>>>>>> Now to my question. Will data be inconsistent if we recycle Kudu
>>>>>>>> after
>>>>>>>> seeing soft memory limit warnings?
>>>>>>>>
>>>>>>>> Is there a way to tell when it is safe to restart Kudu to avoid
>>>>>>>> these
>>>>>>>> issues? Should we use any special procedure when restarting (e.g.
>>>>>>>> only
>>>>>>>> restart the tablet servers, only restart one tablet server at a
>>>>>>>> time or
>>>>>>>> something like that)?
>>>>>>>>
>>>>>>>> The table design uses 50 tablets per day (times 90 days). It is 8 TB
>>>>>>>> of data after 3xreplication over 5 tablet servers.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Petter
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Andrew Wong
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Andrew Wong
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
>> --
>> David Alves
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Data inconsistency after restart

Reply via email to