RE: Slow queries after massive deletions. Is it due to compaction?

2018-11-26 Thread Sergejs Andrejevs
Got it, thanks a lot for the links.
We’ll go with the upgrade option.

Kind Regards,
Sergejs Andrejevs

From: William Berkeley [mailto:wdberke...@cloudera.com]
Sent: Monday, November 26, 2018 8:13 AM
To: user@kudu.apache.org
Subject: Re: Slow queries after massive deletions. Is it due to compaction?

Hi Sergejs. You are correct. Kudu tracks deletes as a past data plus a "redo" 
that contains delete operations. The base data and the redos are stored on disk 
separately and are logically reconciled on scan.

Brock is right that this situation is improved greatly for certain deletion 
patterns with the fix to KUDU-2429. In particular, if deletions come in large 
contiguous blocks (where contiguity is determined by the primary key ordering), 
then the KUDU-2429 improvement will greatly increase the speed of scans over 
deleted data. Your problem might be solved by upgrading to a version that 
contains that improvement.

Unfortunately, as implied by the description for the 
--tablet_delta_store_major_compact_min_ratio flag, major delta compaction will 
not eliminate delete operations in the redo files. Those are only compacted 
with the base data when a merge compaction (also known as a rowset compaction) 
occurs. However, right now that sort of compaction is triggered only by having 
rowsets whose minimum and maximum key bounds overlap. So, for example, if you 
are inserting data in increasing or decreasing primary key order, there won't 
be any merge compactions. 
KUDU-1625<https://issues.apache.org/jira/browse/KUDU-1625> tracks the 
improvement to trigger rowset compaction based on having a high percentage of 
deleted data.

I can't think of a great workaround without potentially changing the 
partitioning or schema of the table, unfortunately. It's possible to coax Kudu 
into doing merge compactions by inserting rows in the same approximate key 
range as the deleted data and deleting them quickly. This would hopefully cause 
the merge compaction to compact away a lot of the older deleted data, but it 
would leave the newly inserted and deleted data. Plus, if there are concurrent 
queries, this sort of workaround could cause wrong results, and filtering them 
out efficiently, at least Kudu-side, would mean a change to the primary key.

There is a good solution if you can change your partitioning scheme. If you use 
range partitioning to group together the blocks of rows that will be deleted, 
for example by range partitioning with one partition by day and deleting a day 
at a time, then deletes can be done efficiently by dropping range partitions. 
See 
range-partitioning<https://kudu.apache.org/docs/schema_design.html#range-partitioning>.
 Besides the restrictions this places on your schema and on the manner in which 
deletes can be done efficiently, keep in mind that dropping a range partition 
is not transactional and scans concurrent with drops of range partitions do not 
have consistency guarantees.

-Will

On Sun, Nov 25, 2018 at 3:40 PM Brock Noland 
mailto:br...@phdata.io>> wrote:
Hi,

I believe you are hitting a known issue and I think if you upgrade to 5.15.1 
you'll see the fix:

https://www.cloudera.com/documentation/enterprise/release-notes/topics/kudu_fixed_issues.html#fixed-5-15-1

"Greatly improved the performance of many types of queries on tables from which 
many rows have been deleted."

I think there might have been more than one JIRA, but here is one of the fixes:

https://issues.apache.org/jira/browse/KUDU-2429

On Thu, Nov 22, 2018 at 9:57 AM Sergejs Andrejevs 
mailto:s.andrej...@intrum.com>> wrote:
Hi,

Is there a way to call of MajorDeltaCompactionOp for a table/tablet/rowset?

We’ve faced with an issue:

0.   Kudu table is created

1.   Data is inserted there

2.   Run select query - it goes fast (matter of a few seconds)

3.   Delete all data from the table (but not dropping the table)

4.   Run select query - it goes slow (4-6 minutes)

Investigating and reading documentation of Kudu has leaded to a thought that 
delete operations are done logically, but physically the table contains written 
data and deletes are applied each time on top of it.
I had a look at kudu tablet and there are quite large “redo” blocks (see one of 
rowset examples below).
There was a thought that compression and encoding play their role (reducing the 
chances to run compaction), but removing them (keeping column defaults) hasn’t 
helped as well.
We run tservers

-  maintenance_manager_num_threads=10 (increased comparing to default)

-  tablet_delta_store_major_compact_min_ratio=0.1000149011612 
(default value)

-  kudu 1.7.0-cdh5.15.0

From documentation and comments in code I saw the description of 
tablet_delta_store_major_compact_min_ratio: “Minimum ratio of sizeof(deltas) to 
sizeof(base data) before a major compaction.”
And “Major compactions: the score will be the result of 
sizeof(deltas)/

Re: Slow queries after massive deletions. Is it due to compaction?

2018-11-25 Thread William Berkeley
Hi Sergejs. You are correct. Kudu tracks deletes as a past data plus a
"redo" that contains delete operations. The base data and the redos are
stored on disk separately and are logically reconciled on scan.

Brock is right that this situation is improved greatly for certain deletion
patterns with the fix to KUDU-2429. In particular, if deletions come in
large contiguous blocks (where contiguity is determined by the primary key
ordering), then the KUDU-2429 improvement will greatly increase the speed
of scans over deleted data. Your problem might be solved by upgrading to a
version that contains that improvement.

Unfortunately, as implied by the description for the
--tablet_delta_store_major_compact_min_ratio flag, major delta compaction
will not eliminate delete operations in the redo files. Those are only
compacted with the base data when a merge compaction (also known as a
rowset compaction) occurs. However, right now that sort of compaction is
triggered only by having rowsets whose minimum and maximum key bounds
overlap. So, for example, if you are inserting data in increasing or
decreasing primary key order, there won't be any merge compactions.
KUDU-1625  tracks the
improvement to trigger rowset compaction based on having a high percentage
of deleted data.

I can't think of a great workaround without potentially changing the
partitioning or schema of the table, unfortunately. It's possible to coax
Kudu into doing merge compactions by inserting rows in the same approximate
key range as the deleted data and deleting them quickly. This would
hopefully cause the merge compaction to compact away a lot of the older
deleted data, but it would leave the newly inserted and deleted data. Plus,
if there are concurrent queries, this sort of workaround could cause wrong
results, and filtering them out efficiently, at least Kudu-side, would mean
a change to the primary key.

There is a good solution if you can change your partitioning scheme. If you
use range partitioning to group together the blocks of rows that will be
deleted, for example by range partitioning with one partition by day and
deleting a day at a time, then deletes can be done efficiently by dropping
range partitions. See range-partitioning
.
Besides the restrictions this places on your schema and on the manner in
which deletes can be done efficiently, keep in mind that dropping a range
partition is not transactional and scans concurrent with drops of range
partitions do not have consistency guarantees.

-Will

On Sun, Nov 25, 2018 at 3:40 PM Brock Noland  wrote:

> Hi,
>
> I believe you are hitting a known issue and I think if you upgrade to
> 5.15.1 you'll see the fix:
>
>
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/kudu_fixed_issues.html#fixed-5-15-1
>
> "Greatly improved the performance of many types of queries on tables from
> which many rows have been deleted."
>
> I think there might have been more than one JIRA, but here is one of the
> fixes:
>
> https://issues.apache.org/jira/browse/KUDU-2429
>
> On Thu, Nov 22, 2018 at 9:57 AM Sergejs Andrejevs 
> wrote:
>
>> Hi,
>>
>>
>>
>> Is there a way to call of MajorDeltaCompactionOp for a
>> table/tablet/rowset?
>>
>>
>>
>> We’ve faced with an issue:
>>
>> 0.   Kudu table is created
>>
>> 1.   Data is inserted there
>>
>> 2.   Run select query - it goes fast (matter of a few seconds)
>>
>> 3.   Delete all data from the table (but not dropping the table)
>>
>> 4.   Run select query - it goes slow (4-6 minutes)
>>
>>
>>
>> Investigating and reading documentation of Kudu has leaded to a thought
>> that delete operations are done logically, but physically the table
>> contains written data and deletes are applied each time on top of it.
>>
>> I had a look at kudu tablet and there are quite large “redo” blocks (see
>> one of rowset examples below).
>>
>> There was a thought that compression and encoding play their role
>> (reducing the chances to run compaction), but removing them (keeping column
>> defaults) hasn’t helped as well.
>>
>> We run tservers
>>
>> -  maintenance_manager_num_threads=10 (increased comparing to
>> default)
>>
>> -  tablet_delta_store_major_compact_min_ratio=0.1000149011612
>> (default value)
>>
>> -  kudu 1.7.0-cdh5.15.0
>>
>>
>>
>> From documentation and comments in code I saw the description of
>> tablet_delta_store_major_compact_min_ratio: “Minimum ratio of
>> sizeof(deltas) to sizeof(base data) before a major compaction.”
>>
>> And “Major compactions: the score will be the result of
>> sizeof(deltas)/sizeof(base data), unless it is smaller than
>> tablet_delta_store_major_compact_min_ratio or if the delta files are only
>> composed of deletes, in which case the score is brought down to zero.”
>>
>> So basically the table stays in such state for more than a day.
>>
>>
>>
>> While