Re: [PoC] Improve dead tuple storage for lazy vacuum

Masahiko Sawada Fri, 24 Feb 2023 00:41:23 -0800

On Thu, Feb 23, 2023 at 6:41 PM John Naylor
<john.nay...@enterprisedb.com> wrote:
>
> I ran a couple "in situ" tests on server hardware using UUID columns, since 
> they are common in the real world and have bad correlation to heap order, so 
> are a challenge for index vacuum.


Thank you for the test!

>
> === test 1, delete everything from a small table, with very small 
> maintenance_work_mem:
>
> alter system set shared_buffers ='4GB';
> alter system set max_wal_size ='10GB';
> alter system set checkpoint_timeout ='30 min';
> alter system set autovacuum =off;
>
> -- unrealistically low
> alter system set maintenance_work_mem = '32MB';
>
> create table if not exists test (x uuid);
> truncate table test;
> insert into test (x) select gen_random_uuid() from 
> generate_series(1,50*1000*1000);
> create index on test (x);
>
> delete from test;
> vacuum (verbose, truncate off) test;
> --
>
> master:
> INFO:  finished vacuuming "john.naylor.public.test": index scans: 9
> system usage: CPU: user: 70.04 s, system: 19.85 s, elapsed: 802.06 s
>
> v29 patch:
> INFO:  finished vacuuming "john.naylor.public.test": index scans: 1
> system usage: CPU: user: 9.80 s, system: 2.62 s, elapsed: 36.68 s
>
> This is a bit artificial, but it's easy to construct cases where the array 
> leads to multiple index scans but the new tid store can fit everythin without 
> breaking a sweat. I didn't save the progress reporting, but v29 was using 
> about 11MB for tid storage.

Cool.

>
>
> === test 2: try to stress tid lookup with production maintenance_work_mem:
> 1. use unlogged table to reduce noise
> 2. vacuum freeze first to reduce heap scan time
> 3. delete some records at the beginning and end of heap to defeat binary 
> search's pre-check
>
> alter system set shared_buffers ='4GB';
> alter system set max_wal_size ='10GB';
> alter system set checkpoint_timeout ='30 min';
> alter system set autovacuum =off;
>
> alter system set maintenance_work_mem = '1GB';
>
> create unlogged table if not exists test (x uuid);
> truncate table test;
> insert into test (x) select gen_random_uuid() from 
> generate_series(1,1000*1000*1000);
> vacuum_freeze test;
>
> select pg_size_pretty(pg_table_size('test'));
>  pg_size_pretty
> ----------------
>  41 GB
>
> create index on test (x);
>
> select pg_size_pretty(pg_total_relation_size('test'));
>  pg_size_pretty
> ----------------
>  71 GB
>
> select max(ctid) from test;
>      max
> --------------
>  (5405405,75)
>
> delete from test where ctid <  '(100000,0)'::tid;
> delete from test where ctid > '(5300000,0)'::tid;
>
> vacuum (verbose, truncate off) test;
>
> both:
> INFO:  vacuuming "john.naylor.public.test"
> INFO:  finished vacuuming "john.naylor.public.test": index scans: 1
> index scan needed: 205406 pages from table (3.80% of total) had 38000000 dead 
> item identifiers removed
>
> --
> master:
> system usage: CPU: user: 134.32 s, system: 19.24 s, elapsed: 286.14 s
>
> v29 patch:
> system usage: CPU: user:  97.71 s, system: 45.78 s, elapsed: 573.94 s

In v29 vacuum took twice as long (286 s vs. 573 s)?

>
> The entire vacuum took 25% less wall clock time. Reminder that this is 
> without wal logging, and also unscientific because only one run.
>
> --
> I took 10 seconds of perf data while index vacuuming was going on (showing 
> calls > 2%):
>
> master:
>   40.59%  postgres  postgres            [.] vac_cmp_itemptr
>   24.97%  postgres  libc-2.17.so        [.] bsearch
>    6.67%  postgres  postgres            [.] btvacuumpage
>    4.61%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
>    3.48%  postgres  postgres            [.] PageIndexMultiDelete
>    2.67%  postgres  postgres            [.] vac_tid_reaped
>    2.03%  postgres  postgres            [.] compactify_tuples
>    2.01%  postgres  libc-2.17.so        [.] __memcpy_ssse3_back
>
> v29 patch:
>
>   29.22%  postgres  postgres            [.] TidStoreIsMember
>    9.30%  postgres  postgres            [.] btvacuumpage
>    7.76%  postgres  postgres            [.] PageIndexMultiDelete
>    6.31%  postgres  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
>    5.60%  postgres  postgres            [.] compactify_tuples
>    4.26%  postgres  libc-2.17.so        [.] __memcpy_ssse3_back
>    4.12%  postgres  postgres            [.] hash_search_with_hash_value
>
> --
> master:
> psql -c "select phase, heap_blks_total, heap_blks_scanned, max_dead_tuples, 
> num_dead_tuples from pg_stat_progress_vacuum"
>        phase       | heap_blks_total | heap_blks_scanned | max_dead_tuples | 
> num_dead_tuples
> -------------------+-----------------+-------------------+-----------------+-----------------
>  vacuuming indexes |         5405406 |           5405406 |       178956969 |  
>       38000000
>
> v29 patch:
> psql  -c "select phase, heap_blks_total, heap_blks_scanned, 
> max_dead_tuple_bytes, dead_tuple_bytes from pg_stat_progress_vacuum"
>        phase       | heap_blks_total | heap_blks_scanned | 
> max_dead_tuple_bytes | dead_tuple_bytes
> -------------------+-----------------+-------------------+----------------------+------------------
>  vacuuming indexes |         5405406 |           5405406 |           
> 1073670144 |          8678064
>
> Here, the old array pessimistically needs 1GB allocated (as for any table > 
> ~5GB), but only fills 228MB for tid lookup. The patch reports 8.7MB. Tables 
> that only fit, say, 30-50 tuples per page will have less extreme differences 
> in memory use. Same for the case where only a couple dead items occur per 
> page, with many uninteresting pages in between. Even so, the allocation will 
> be much more accurately sized in the patch, especially in non-parallel vacuum.

Agreed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] Improve dead tuple storage for lazy vacuum

Reply via email to