Hi Petter
Don't have answers yet, but I do have some more questions
(inline)
Petter von Dolwitz (Hem) writes:
Hi David,
In short to summarize:
1. I ingest data. Kudus maintenance threads stops working (soft
memory
limit) and incoming data is throttled. There are no errors
reported on the
client side.
What is the "client side"? impala? spark? java/c++.
2. I stop ingestion and wait until i *think* Kudu is finsished.
The question above is pertinent. Impala will not return until a
query
is fully successful, though it may return an error and leave a
query
only half-way executed. If you're using the client apis directly
though are you checking for error when inserting?
3. I restart Kudu.
4. I validate the inserted data by doing count(*) on groups of
data in
Kudu. For several groups, Kudu reports a lot of rows missing.
Kudu's default scan mode is READ_LATEST. While this is the most
performance oriented mode, its also the one with the least
guarantees
so, on startup its possible that it reads from a stale replica,
giving
the _appearance_ that rows went missing. Things to try here:
- Try the same query a few minutes later. Is the answer
different?
- If the above is true consider changing your scan mode to
READ_AT_SNAPHOT. In this mode data is guaranteed not to be
state,
though you might have to wait for all replicas to be ready
5. I ingest the same data again. Client reports that all row are
already
present.
This isn't surprising _if_ the problem is indeed from state
replicas.
6. Doing the count(*) exercise again now gives me the correct
number of
rows.
This tells me that the data was ingested into Kudu on the first
attempt but
a scan did not find the data. Inserting the data again made it
visible.
Can it be that after the scan it's just that enough time has
elapsed
so that all replicas are caught up? I'd say this is likely the
case.
Br,
Petter
2017-12-07 21:39 GMT+01:00 David Alves <[email protected]>:
Hi Petter
I'd like to clarify what exactly happened and exactly what
are you
referring to as "inconsistency".
From what I understand of the first error you observed, the
Kudu was
underprovisioned, memory wise, and the ingest jobs/queries
failed. Is that
right? Since Kudu doesn't have atomic multi-row writes, it's
currently
expected in this case that you'll end up with partially written
data.
If you tried the same job again, and it succeeded, for
certain types of
operation (UPSERT, INSERT IGNORE) then the remaining rows would
be written
and all the data would be there as expected.
I'd like to distinguish this lack of atomicity on multi-row
transactions from "inconsistency", which is what you might
observe if an
operation didn't fail, but you couldn't see all the data. For
this latter
case there are options you can choose to avoid any
inconsistency.
Best
David
On Wed, Dec 6, 2017 at 4:26 AM, Petter von Dolwitz (Hem) <
[email protected]> wrote:
Thanks for your reply Andrew!
>How did you verify that all the data was inserted and how did
>you find
some data missing?
This was done using Impala. We counted the rows for groups
representing
the chunks we inserted.
>Following up on what I posted, take a look at
https://kudu.apache.org/docs/transaction_semantics.html#_
read_operations_scans. It seems definitely possible that not
all of the
rows had finished inserting when counting, or that the scans
were sent to a
stale replica.
Before we shut down we could only see the following in the
logs. I.e., no
sign that ingestion was still ongoing.
kudu-tserver.ip-xx-yyy-z-nnn.root.log.INFO.20171201-065232.90314:I1201
07:27:35.010694 90793 maintenance_manager.cc:383] P
a38902afefca4a85a5469d149df9b4cb: we have exceeded our soft
memory limit
(current capacity is 67.52%). However, there are no ops
currently runnable
which would free memory.
Also the (cloudera) metric
total_kudu_rows_inserted_rate_across_kudu_replicas
showed zero.
Still it seems like some data became inconsistent after
restart. But if
the maintenance_manager performs important jobs that are
required to ensure
that all data is inserted then I can understand why we ended
up with
inconsistent data. But, if I understand you correct, you are
saying that
these jobs are not critical for ingestion. In the link you
provided I read
"Impala scans are currently performed as READ_LATEST and have
no
consistency guarantees.". I would assume this means that it
does not
guarantee consistency if new data is inserted but should give
valid (and
same) results if no new data is inserted?
I have not tried the ksck tool yet. Thank you for reminding. I
will have
a look.
Br,
Petter
2017-12-06 1:31 GMT+01:00 Andrew Wong <[email protected]>:
How did you verify that all the data was inserted and how did
you find
some data missing? I'm wondering if it's possible that the
initial
"missing" data was data that Kudu was still in the process
of inserting
(albeit slowly, due to memory backpressure or somesuch).
Following up on what I posted, take a look at
https://kudu.apache.org/docs/transaction_semantics.html#_
read_operations_scans. It seems definitely possible that not
all of the
rows had finished inserting when counting, or that the scans
were sent to a
stale replica.
On Tue, Dec 5, 2017 at 4:18 PM, Andrew Wong
<[email protected]> wrote:
Hi Petter,
When we verified that all data was inserted we found that
some data was
missing. We added this missing data and on some chunks we
got the
information that all rows were already present, i.e impala
says something
like Modified: 0 rows, nnnnnnn errors. Doing the
verification again now
shows that the Kudu table is complete. So, even though we
did not insert
any data on some chunks, a count(*) operation over these
chunks now returns
a different value.
How did you verify that all the data was inserted and how
did you find
some data missing? I'm wondering if it's possible that the
initial
"missing" data was data that Kudu was still in the process
of inserting
(albeit slowly, due to memory backpressure or somesuch).
Now to my question. Will data be inconsistent if we recycle
Kudu after
seeing soft memory limit warnings?
Your data should be consistently written, even with those
warnings.
AFAIK they would cause a bit of slowness, not incorrect
results.
Is there a way to tell when it is safe to restart Kudu to
avoid these
issues? Should we use any special procedure when restarting
(e.g. only
restart the tablet servers, only restart one tablet server
at a time or
something like that)?
In general, you can use the `ksck` tool to check the health
of your
cluster. See
https://kudu.apache.org/docs/command_line_tools_referenc
e.html#cluster-ksck for more details. For restarting a
cluster, I
would recommend taking down all tablet servers at once,
otherwise tablet
replicas may try to replicate data from the server that was
taken down.
Hope this helped,
Andrew
On Tue, Dec 5, 2017 at 10:42 AM, Petter von Dolwitz (Hem) <
[email protected]> wrote:
Hi Kudu users,
We just started to use Kudu (1.4.0+cdh5.12.1). To make a
baseline for
evaluation we ingested 3 month worth of data. During
ingestion we were
facing messages from the maintenance threads that a soft
memory limit were
reached. It seems like the background maintenance threads
stopped
performing their tasks at this point in time. It also so
seems like the
memory was never recovered even after stopping ingestion so
I guess there
was a large backlog being built up. I guess the root cause
here is that we
were a bit too conservative when giving Kudu memory. After
a reststart a
lot of maintenance tasks were started (i.e. compaction).
When we verified that all data was inserted we found that
some data
was missing. We added this missing data and on some chunks
we got the
information that all rows were already present, i.e impala
says something
like Modified: 0 rows, nnnnnnn errors. Doing the
verification again now
shows that the Kudu table is complete. So, even though we
did not insert
any data on some chunks, a count(*) operation over these
chunks now returns
a different value.
Now to my question. Will data be inconsistent if we recycle
Kudu after
seeing soft memory limit warnings?
Is there a way to tell when it is safe to restart Kudu to
avoid these
issues? Should we use any special procedure when restarting
(e.g. only
restart the tablet servers, only restart one tablet server
at a time or
something like that)?
The table design uses 50 tablets per day (times 90
days). It is 8 TB
of data after 3xreplication over 5 tablet servers.
Thanks,
Petter
--
Andrew Wong
--
Andrew Wong
--
David Alves