On 5/21/24 6:40 AM, John Naylor wrote:
On Mon, May 20, 2024 at 8:41 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:

On Mon, May 20, 2024 at 8:47 PM Jonathan S. Katz <jk...@postgresql.org> wrote:

On 5/20/24 2:58 AM, John Naylor wrote:
Hi Jon,

Regarding vacuum "has shown up to a 6x improvement in overall time to
complete its work" -- I believe I've seen reported numbers close to
that only 1) when measuring the index phase in isolation or maybe 2)
the entire vacuum of unlogged tables with one, perfectly-correlated
index (testing has less variance with WAL out of the picture). I
believe tables with many indexes would show a lot of improvement, but
I'm not aware of testing that case specifically. Can you clarify where
6x came from?

Sawada-san showed me the original context, but I can't rapidly find it
in the thread. Sawada-san, can you please share the numbers behind this?


I referenced the numbers that I measured during the development[1]
(test scripts are here[2]). IIRC I used unlogged tables and indexes,
and these numbers were the entire vacuum execution time including heap
scanning, index vacuuming and heap vacuuming.

Thanks for confirming.

The wording "has a new internal data structure that reduces memory
usage and has shown up to a 6x improvement in overall time to complete
its work" is specific for runtime, and the memory use is less
specific. Unlogged tables are not the norm, so I'd be cautious of
reporting numbers specifically designed (for testing) to isolate the
thing that changed.

I'm wondering if it might be both more impressive-sounding and more
realistic for the average user experience to reverse that: specific on
memory, and less specific on speed. The best-case memory reduction
occurs for table update patterns that are highly localized, such as
the most recently inserted records, and I'd say those are a lot more
common than the use of unlogged tables.

Maybe something like "has a new internal data structure that reduces
overall time to complete its work and can use up to 20x less memory."

Now, it is true that when dead tuples are sparse and evenly spaced
(e.g. 1 every 100 pages), vacuum can now actually use more memory than
v16. However, the nature of that scenario also means that the number
of TIDs just can't get very big to begin with. In contrast, while the
runtime improvement for normal (logged) tables is likely not
earth-shattering, I believe it will always be at least somewhat
faster, and never slower.

Thanks for the feedback. I flipped it around, per your suggestion:

"has a new internal data structure that has shown up to a 20x memory reduction for vacuum, along with improvements in overall time to complete its work."

Thanks,

Jonathan

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to