Hi, I believe you are hitting a known issue and I think if you upgrade to 5.15.1 you'll see the fix:
https://www.cloudera.com/documentation/enterprise/release-notes/topics/kudu_fixed_issues.html#fixed-5-15-1 "Greatly improved the performance of many types of queries on tables from which many rows have been deleted." I think there might have been more than one JIRA, but here is one of the fixes: https://issues.apache.org/jira/browse/KUDU-2429 On Thu, Nov 22, 2018 at 9:57 AM Sergejs Andrejevs <s.andrej...@intrum.com> wrote: > Hi, > > > > Is there a way to call of MajorDeltaCompactionOp for a table/tablet/rowset? > > > > We’ve faced with an issue: > > 0. Kudu table is created > > 1. Data is inserted there > > 2. Run select query - it goes fast (matter of a few seconds) > > 3. Delete all data from the table (but not dropping the table) > > 4. Run select query - it goes slow (4-6 minutes) > > > > Investigating and reading documentation of Kudu has leaded to a thought > that delete operations are done logically, but physically the table > contains written data and deletes are applied each time on top of it. > > I had a look at kudu tablet and there are quite large “redo” blocks (see > one of rowset examples below). > > There was a thought that compression and encoding play their role > (reducing the chances to run compaction), but removing them (keeping column > defaults) hasn’t helped as well. > > We run tservers > > - maintenance_manager_num_threads=10 (increased comparing to > default) > > - tablet_delta_store_major_compact_min_ratio=0.10000000149011612 > (default value) > > - kudu 1.7.0-cdh5.15.0 > > > > From documentation and comments in code I saw the description of > tablet_delta_store_major_compact_min_ratio: “Minimum ratio of > sizeof(deltas) to sizeof(base data) before a major compaction.” > > And “Major compactions: the score will be the result of > sizeof(deltas)/sizeof(base data), unless it is smaller than > tablet_delta_store_major_compact_min_ratio or if the delta files are only > composed of deletes, in which case the score is brought down to zero.” > > So basically the table stays in such state for more than a day. > > > > While majority of tables will have mostly scans, there will be a couple of > large tables with large number of deletions (but not of all data). > > Could you advise how to improve scans after large deletions? > > > > block-id | block-kind | column| cfile-size | cfile-data-type > | > cfile-delta-stats > | cfile-encoding | cfile-compression > > ----------+-------------+-------+----------- > +-----------------+-----------------------------------------------------------------------------------------------------------------------------+-----------------+------------------- > > 24693586 | column | var1 | 2.80M | int64 > | > | BIT_SHUFFLE | > NO_COMPRESSION > > 24693587 | column | var2 | 100.7K | int64 > | > | BIT_SHUFFLE | NO_COMPRESSION > > 24693588 | column | var3 | 4.95M | int64 > | > | BIT_SHUFFLE | NO_COMPRESSION > > 24693589 | column | var4 | 1.58M | string > | > | DICT_ENCODING | LZ4 > > 24693590 | column | var5 | 8.82M | string > | > | PLAIN_ENCODING | LZ4 > > 24693591 | column | var6 | 2.7K | string | > > | > DICT_ENCODING | LZ4 > > 24700691 | redo | | 14.04M | binary | ts > range=[6319363930065100800, 6319364908129189926], delete_count=[2190649], > reinsert_count=[0], update_counts_by_col_id=[] | PLAIN_ENCODING | LZ4 > > 24693592 | bloom | | 5.04M | binary > | > | PLAIN_ENCODING | NO_COMPRESSION > > 24693593 | adhoc-index | | 8.94M | binary > | > | PREFIX_ENCODING | LZ4 > > > > Kind Regards, > > Sergejs Andrejevs > > >