RE: Conflict detection for update_deleted in logical replication

Hayato Kuroda (Fujitsu) Wed, 11 Dec 2024 01:30:29 -0800

Dear hackers,

I did some benchmarks with the patch. More detail, a pub-sub replication system
was built and TPS was measured on the subscriber. Results were shown that the
performance can be degraded if the wal_receiver_status_interval is long.
This is expected behavior because the patch retains more dead tuples on the
subscriber side.
Also, we've considered a new mechanism which dynamically tunes the period of 
status
request, and confirmed that it reduced the regression. This is described the 
latter
part.


Below part contained the detailed report.

## Motivation - why the benchmark is needed

V15 patch set introduces a new replication slot on the subscriber side to retain
needed tuples for the update_deleted detection.
However, this may affect the performance of query executions on the subscriber 
because
1) tuples to be scaned will be increased and 2) HOT update cannot be worked.
The second issue comes from the fact that HOT update can work only when both 
tuples
can be located on the same page.

Based on above reasons I ran benchmark tests for the subscriber. The variable 
of the
measurement is the wal_receiver_status_interval, which controls the duration of 
status request.

## Used source code

HEAD was 962da900, and applied v15 patch set atop it.

## Environment

RHEL 7 machine which has 755GB memory, 4 physical CPUs and 120 logical 
processors.

## Workload

1. Constructed a pub-sub replication system.
   Parameters for both instances were:

   share_buffers = 30GB
   min_wal_size = 10GB
   max_wal_size = 20GB
   autovacuum = false
   track_commit_timestamp = on (only for subscriber)
  
2. Ran pgbench with initialize mode. The scale factor was set to 100.
3. Ran pgbench with 60 clients for the subscriber. The duration was 120s,
   and the measurement was repeated 5 times.

Attached script can automate above steps. You can specify the source type in the
measure.sh and run it.

## Comparison

The performance testing was done for HEAD and patched source code.
In case of patched, "detect_update_deleted" parameter was set to on. Also, the
parameter "wal_receiver_status_interval" was varied to 1s, 10s, and 100s to
check the effect.

Appendix table shows results [1]. The regression becomes larger based on the 
wal_receiver_status_interval.
TPS regressions are almost 5%(interval=1s) -> 25%(intervals=10s) -> 55% 
(intervals=55%).

Attached png file visualize the result: each bar shows median.

## Analysis

I attached to the backend via perf and found that heapam_index_fetch_tuple()
consumed much CPU time ony in case of patched [2]. Also, I checked 
pg_stat_all_tables
view and found that HOT update rarely happened only in the patched case [3].
This means that whether backend could do HOT update is the dominant.

When the detect_update_deleted = on, the additional slot is defined on the 
subscriber 
ide and it is updated based on the activity; The frequency is determined by the
wal_receiver_status_intervals. In intervals=100s case, it is relatively larger
for the workload so that some dead tuples remained, this makes query processing 
slower.

This result means that users may have to tune consider the interval period based
on their workload. However, it is difficult to predict the appropriate value.

## Experiment - dynamic period tuning

Based on above, I and Hou discussed off-list and implemented new mechanism which
tunes the duration between status request dynamically. The basic idea is similar
with what slotsync worker does. The interval of requests is initially 100ms,
and becomes twice when if there are no XID assignment since the last 
advancement.
The maxium value is either of wal_receiver_status_interval or 3min.

Benchmark results with this are shown in [4]. Here wal_receiver_status_interval
is not changed, so we can compare with the HEAD and interval=10s case in [1] - 
59536 vs 59597.
The regression is less than 1%.

The idea has already been included in v16-0002, please refer it.

## Experiment - shorter interval

Just in case - I did an extreme case that wal_receiver_status_interval is quite 
short - 10ms.
To make interval shorter I implemented an attached patch for both cases. 
Results are shown [5].
The regression is not observed or even better (I think this is caused by the 
randomness).

This experiment also shows the result that the regression is happened due to 
the dead tuple.

## Appendix [1] - result table

Each cells show transaction per seconds of the run.

patched
# run   interval=1s     intervals=10s   intervals=100s
1       55876   45288   26956
2       56086   45336   26799
3       56121   45129   26753
4       56310   45169   26542
5       55389   45071   26735
median  56086   45169   26753

HEAD
# run   interval=1s     intervals=10s   intervals=100s
1       59096   59343   59341
2       59671   59413   59281
3       59131   59597   58966
4       59239   59693   59518
5       59165   59631   59487
median          59165   59597   59341


## Appendix [2] - perf analysis

patched:

```
   - 58.29% heapam_index_fetch_tuple
      + 38.28% heap_hot_search_buffer
      + 13.88% ReleaseAndReadBuffer
        5.34% heap_page_prune_opt
      + 13.88% ReleaseAndReadBuffer
```

head:

```
   - 2.13% heapam_index_fetch_tuple
        1.06% heap_hot_search_buffer
        0.62% heap_page_prune_opt
```

## Appendix [3] - pg_stat

patched
```
postgres=# SELECT relname, n_tup_upd, n_tup_hot_upd, n_tup_newpage_upd, 
n_tup_upd - n_tup_hot_upd AS n_tup_non_hot FROM pg_stat_all_tables where 
relname like 'pgbench%';
     relname      | n_tup_upd | n_tup_hot_upd | n_tup_newpage_upd | 
n_tup_non_hot 
------------------+-----------+---------------+-------------------+---------------
 pgbench_history  |         0 |             0 |                 0 |             0
 pgbench_tellers  |    453161 |         37996 |            415165 |        
415165
 pgbench_accounts |    453161 |             0 |            453161 |        
453161
 pgbench_branches |    453161 |        272853 |            180308 |        
180308
(4 rows)
```

head
```
postgres=# SELECT relname, n_tup_upd, n_tup_hot_upd, n_tup_newpage_upd, 
n_tup_upd - n_tup_hot_upd AS n_tup_non_hot FROM 
pg_stat_all_tables where relname like 'pgbench%';
     relname      | n_tup_upd | n_tup_hot_upd | n_tup_newpage_upd | 
n_tup_non_hot 
------------------+-----------+---------------+-------------------+---------------
 pgbench_history  |         0 |             0 |                 0 |             0
 pgbench_tellers  |   2078197 |       2077583 |               614 |           
614
 pgbench_accounts |   2078197 |       1911535 |            166662 |        
166662
 pgbench_branches |   2078197 |       2078197 |                 0 |             0
(4 rows)
```


## Appendix [4] - dynamic status request

# run   dynamic (v15 + PoC)
1       59627
2       59536
3       59359
4       59443
5       59541
median  59536

## Apendix [5] - shorter wal_receiver_status_interval

pached
# run   interval=10ms
1       58081
2       57876
3       58083
4       57915
5       57933
median  57933

head
# run   interval=10ms
1       57595
2       57322
3       57271
4       57421
5       57590
median  57421

Best regards,
Hayato Kuroda
FUJITSU LIMITED

change_to_ms.diffs
Description: change_to_ms.diffs

measure.sh
Description: measure.sh

setup.sh
Description: setup.sh

RE: Conflict detection for update_deleted in logical replication

Reply via email to