Re: Slow queries after massive deletions. Is it due to compaction?

Brock Noland Sun, 25 Nov 2018 15:40:36 -0800

Hi,

I believe you are hitting a known issue and I think if you upgrade to
5.15.1 you'll see the fix:


https://www.cloudera.com/documentation/enterprise/release-notes/topics/kudu_fixed_issues.html#fixed-5-15-1

"Greatly improved the performance of many types of queries on tables from
which many rows have been deleted."

I think there might have been more than one JIRA, but here is one of the
fixes:

https://issues.apache.org/jira/browse/KUDU-2429

On Thu, Nov 22, 2018 at 9:57 AM Sergejs Andrejevs <s.andrej...@intrum.com>
wrote:

> Hi,
>
>
>
> Is there a way to call of MajorDeltaCompactionOp for a table/tablet/rowset?
>
>
>
> We’ve faced with an issue:
>
> 0.       Kudu table is created
>
> 1.       Data is inserted there
>
> 2.       Run select query - it goes fast (matter of a few seconds)
>
> 3.       Delete all data from the table (but not dropping the table)
>
> 4.       Run select query - it goes slow (4-6 minutes)
>
>
>
> Investigating and reading documentation of Kudu has leaded to a thought
> that delete operations are done logically, but physically the table
> contains written data and deletes are applied each time on top of it.
>
> I had a look at kudu tablet and there are quite large “redo” blocks (see
> one of rowset examples below).
>
> There was a thought that compression and encoding play their role
> (reducing the chances to run compaction), but removing them (keeping column
> defaults) hasn’t helped as well.
>
> We run tservers
>
> -          maintenance_manager_num_threads=10 (increased comparing to
> default)
>
> -          tablet_delta_store_major_compact_min_ratio=0.10000000149011612
> (default value)
>
> -          kudu 1.7.0-cdh5.15.0
>
>
>
> From documentation and comments in code I saw the description of
> tablet_delta_store_major_compact_min_ratio: “Minimum ratio of
> sizeof(deltas) to sizeof(base data) before a major compaction.”
>
> And “Major compactions: the score will be the result of
> sizeof(deltas)/sizeof(base data), unless it is smaller than
> tablet_delta_store_major_compact_min_ratio or if the delta files are only
> composed of deletes, in which case the score is brought down to zero.”
>
> So basically the table stays in such state for more than a day.
>
>
>
> While majority of tables will have mostly scans, there will be a couple of
> large tables with large number of deletions (but not of all data).
>
> Could you advise how to improve scans after large deletions?
>
>
>
> block-id | block-kind  | column| cfile-size | cfile-data-type
> |
>                                  cfile-delta-stats
> | cfile-encoding  | cfile-compression
>
> ----------+-------------+-------+-----------
> +-----------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------+-------------------
>
> 24693586 | column      | var1  | 2.80M      | int64
> |
>                                                | BIT_SHUFFLE     |
> NO_COMPRESSION
>
> 24693587 | column      | var2  | 100.7K     | int64
> |
>                | BIT_SHUFFLE     | NO_COMPRESSION
>
> 24693588 | column      | var3  | 4.95M      | int64
> |
> | BIT_SHUFFLE     | NO_COMPRESSION
>
> 24693589 | column      | var4  | 1.58M      | string
> |
> | DICT_ENCODING   | LZ4
>
> 24693590 | column      | var5  | 8.82M      | string
> |
> | PLAIN_ENCODING  | LZ4
>
> 24693591 | column      | var6  | 2.7K       | string          |
>                                                                               
>                                            |
> DICT_ENCODING   | LZ4
>
> 24700691 | redo        |       | 14.04M     | binary          | ts
> range=[6319363930065100800, 6319364908129189926], delete_count=[2190649],
> reinsert_count=[0], update_counts_by_col_id=[] | PLAIN_ENCODING  | LZ4
>
> 24693592 | bloom       |       | 5.04M      | binary
> |
>                                    | PLAIN_ENCODING  | NO_COMPRESSION
>
> 24693593 | adhoc-index |       | 8.94M      | binary
> |
>    | PREFIX_ENCODING | LZ4
>
>
>
> Kind Regards,
>
> Sergejs Andrejevs
>
>
>

Re: Slow queries after massive deletions. Is it due to compaction?

Reply via email to