Hi Mircea, Hayato, I ran a few more tests on 19devel ,focusing on the partitioned case to better understand the performance behavior.
For scale 500, the serial initialization on my system takes around 34.3 seconds. Using parallel initialization without partitions (-j 10) makes the client-side data generation noticeably faster,But the overall runtime ends up slightly higher because the vacuum phase becomes much longer. However,when running with partitions(pgbench -i -s 500 --partitions=10 -j 10),the total runtime drops to about 21.9 seconds, and the vacuum cost is much smaller.I also verified that the row counts are correct in all cases ,and regression tests still pass locally. So it looks like the main benefit of parallel initialization shows up clearly in the partitioned setup,which matches the expectations discussed earlier.Just sharing these observations in case they are useful for the ongoing review. Thanks again for the work on this patch. Best regards, Lakshmi On Wed, Feb 11, 2026 at 5:53 PM Hayato Kuroda (Fujitsu) < [email protected]> wrote: > Dear Mircea, > > Thanks for the proposal. I also feel the initalization wastes time. > Here are my initial comments. > > 01. > I found that pgbench raises a FATAL in case of -j > --partitions, is there > a > specific reason? > If needed, we may choose the softer way, which adjust nthreads up to the > number > of partitions. -c and -j do the similar one: > > ``` > if (nthreads > nclients && !is_init_mode) > nthreads = nclients; > ``` > > 02. > Also, why is -j accepted in case of non-partitions? > > 03. > Can we port all validation to main()? I found initPopulateTableParallel() > has > such a part. > > 04. > Copying seems to be divided into chunks per COPY_BATCH_SIZE. Is it really > essential to parallelize the initialization? I feel it may optimize even > serialized case thus can be discussed independently. > > 05. > Per my understanding, each thread creates its tables, and all of them are > attached to the parent table. Is it right? I think it needs more code > changes, and I am not sure it is critical to make initialization faster. > > So I suggest using the incremental approach. The first patch only > parallelizes > the data load, and the second patch implements the CREATE TABLE and ALTER > TABLE > ATTACH PARTITION. You can benchmark three patterns, master, 0001, and > 0001 + 0002, then compare the results. IIUC, this is the common approach to > reduce the patch size and make them more reviewable. > > 06. > Missing update for typedefs.list. WorkerTask and CopyTarget can be added > there. > > 07. > Since there is a report like [1], you can benchmark more cases. > > [1]: > https://www.postgresql.org/message-id/CAEvyyTht69zjnosPjziW6dqNLqs-n6eKia2vof108zQp1QFX%3DQ%40mail.gmail.com > > Best regards, > Hayato Kuroda > FUJITSU LIMITED >
