Hello Egor, Thank you very much for taking this patch under your wing!
> I propose, by analogy with the existing 'g'/'G' modes, to use lowercase > letters for client-side data generation and uppercase letters for > server-side generation. Furthermore, I propose considering making the > "one transaction per scale" mode a separate setting. This would result > in the following modes: > 1. g: COPY .. FROM STDIN .. TEXT, single transaction (orig. mode) > 2. c: COPY .. FROM STDIN .. BINARY, single transaction (added mode) > 3. G: INSERT .. SELECT generate_series, single transaction (orig. mode) > 4. I: INSERT .. SELECT unnest, single transaction (added mode) > And: M: multiple transactions. A setting that, when used, makes a mode > run with a transaction for each scale instead of a single transaction. > This would yield 8 possible combinations. Sure thing. I agree with your proposal to add more flexibility to parameters with single exception. For UNNEST test I would suggest to use "U" instead of "I" as it might be confusing later in case of another patch from current CommitFest will make it into the master. I'm referring to: https://commitfest.postgresql.org/patch/6242/ It uses parameter "-i" to start use multiple threads to populate tables. > It would be reasonable to first collect performance measurements for > these modes and then decide whether to keep them, before proceeding with > a full implementation including their selection. Since logic will be slightly different by following your proposal new set of metrics will be required. That's for sure. My main motivation in splitting one huge transaction to fill tables into smaller ones comes from another idea that was put on a backburner - running data population via multiple threads. This idea is implemented in above mentioned patch by Mircea Cadariu. By amount of changes in that patch it is clear that we're quite equal by number of lines. Hence putting the change into my patch would be overwhelming for any reviewer. Another reason for smaller in size transactions ("one per scale") is my experience during generation of test databases that are much bigger than host RAM (e.g., scale=5000). Data population phase is not just slow, but more than often has to use multiple checkpoints for such single transaction because even my max_wal_size was smaller than size of such "change". One might argue that my DB is not tuned properly, but it's a topic for another day. As a side effect of decision to use multiple transactions raises another issue - inability to use FREEZE optimisation for COPY commands which leads to Autovacuum storm in turn even during very process of data population. > Thus, I propose reconsidering the approach to data generation modes > and adding a setting to control the number of transactions. > I also suggest conducting new, more accurate performance measurements to > inform the decision on the necessity of the additional generation modes. Agree and Agree. Both make perfect sense. Best regards, Boris
