On Sat, Sep 23, 2023 at 3:53 PM Melanie Plageman
<melanieplage...@gmail.com> wrote:
>
> Workload F:
>
> +------+--------+---------------------+--------------------+------------------+
> | algo | WAL GB | cptr bgwriter writes| other reads/writes | IO time AV 
> worker|
> +------+--------+---------------------+---------------------+-----------------+
> |    M |    173 |           1,202,231 |         53,957,448 |           12,389 
> |
> |    4 |    189 |           1,212,521 |         55,589,140 |           13,084 
> |
> |    5 |    173 |           1,194,242 |         54,260,118 |           13,407 
> |
> +------+--------+---------------------+--------------------+------------------+
>
> +------+--------------+
> | algo | P99 latency  |
> +------+--------------+
> |    M |       19875  |
> |    4 |       19314  |
> |    5 |       19701  |
> +------+--------------+

Andres mentioned that the P99 latency for the COPY workload (workload F)
might not be meaningful, so I have calculated the duration total, mean,
median, min, max and standard deviation in milliseconds.

Workload F:
+------+------------+-------+--------+--------+--------+---------+
| algo |      Total |   Mean| Median |    Min |    Max |  Stddev |
+------+------------+-------+--------+--------+--------+---------+
|    M |  1,270,903 | 18,155| 17,755 | 17,090 | 19,994 |     869 |
|    4 |  1,167,135 | 16,673| 16,421 | 15,585 | 19,485 |     811 |
|    5 |  1,250,145 | 17,859| 17,704 | 15,763 | 19,871 |   1,009 |
+------+------------+-------+--------+--------+--------+---------+

Interestingly, algorithm 4 had the lowest total duration for all COPYs.
Some investigation of other data collected during the runs led us to
believe this may be due to autovacuum workers doing more IO with
algorithm 4 and thus generating more WAL and ending up initializing more
WAL files themselves. Whereas on master and with algorithm 5, client
backends had to initialize WAL files themselves, leading COPYs to take
longer. This was supported by the presence of more WALInit wait events
for client backends on master and with algorithm 5.

Calculating these made me realize that my conclusions about the work
queue workload (workload I) didn't make much sense. Because this
workload updated a non-indexed column, most pruning was HOT pruning done
on access and basically no page freezing was done by vacuum. This means
we weren't seeing negative performance effects of freezing related to
the work queue table.

The difference in this benchmark came from the relatively poor
performance of the concurrent COPYs when that table was frozen more
aggressively. I plan to run a new version of this workload which updates
an indexed column for comparison and does not use a concurrent COPY.

This is the duration total, mean, median, min, max, and standard
deviation in milliseconds of the COPYs which ran concurrently with the
work queue pgbench.

Workload I COPYs:
+------+--------+-------+--------+--------+--------+---------+
| algo |  Total |  Mean | Median |   Min  |   Max  |  Stddev |
+------+--------+-------+--------+--------+--------+---------+
|    M | 191,032|  4,898|  4,726 |  4,486 |  9,353 |     800 |
|    4 | 193,534|  4,962|  4,793 |  4,533 |  9,381 |     812 |
|    5 | 194,351|  4,983|  4,771 |  4,617 |  9,159 |     783 |
+------+--------+-------+--------+--------+--------+---------+

I think this shows that algorithm 4 COPYs performed the worst. This is
in contrast to the COPY-only workload (F) which did not show worse
performance for algorithm 4. I think this means I should modify the work
queue example and use something other than concurrent COPYs to avoid
obscuring characteristics of the work queue example.

- Melanie


Reply via email to