On 5/21/24 6:40 AM, John Naylor wrote:
On Mon, May 20, 2024 at 8:41 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:On Mon, May 20, 2024 at 8:47 PM Jonathan S. Katz <jk...@postgresql.org> wrote:On 5/20/24 2:58 AM, John Naylor wrote:Hi Jon, Regarding vacuum "has shown up to a 6x improvement in overall time to complete its work" -- I believe I've seen reported numbers close to that only 1) when measuring the index phase in isolation or maybe 2) the entire vacuum of unlogged tables with one, perfectly-correlated index (testing has less variance with WAL out of the picture). I believe tables with many indexes would show a lot of improvement, but I'm not aware of testing that case specifically. Can you clarify where 6x came from?Sawada-san showed me the original context, but I can't rapidly find it in the thread. Sawada-san, can you please share the numbers behind this?I referenced the numbers that I measured during the development[1] (test scripts are here[2]). IIRC I used unlogged tables and indexes, and these numbers were the entire vacuum execution time including heap scanning, index vacuuming and heap vacuuming.Thanks for confirming. The wording "has a new internal data structure that reduces memory usage and has shown up to a 6x improvement in overall time to complete its work" is specific for runtime, and the memory use is less specific. Unlogged tables are not the norm, so I'd be cautious of reporting numbers specifically designed (for testing) to isolate the thing that changed. I'm wondering if it might be both more impressive-sounding and more realistic for the average user experience to reverse that: specific on memory, and less specific on speed. The best-case memory reduction occurs for table update patterns that are highly localized, such as the most recently inserted records, and I'd say those are a lot more common than the use of unlogged tables. Maybe something like "has a new internal data structure that reduces overall time to complete its work and can use up to 20x less memory." Now, it is true that when dead tuples are sparse and evenly spaced (e.g. 1 every 100 pages), vacuum can now actually use more memory than v16. However, the nature of that scenario also means that the number of TIDs just can't get very big to begin with. In contrast, while the runtime improvement for normal (logged) tables is likely not earth-shattering, I believe it will always be at least somewhat faster, and never slower.
Thanks for the feedback. I flipped it around, per your suggestion:"has a new internal data structure that has shown up to a 20x memory reduction for vacuum, along with improvements in overall time to complete its work."
Thanks, Jonathan
OpenPGP_signature.asc
Description: OpenPGP digital signature