On 5/30/23 10:05 PM, David Rowley wrote:

My understanding had been that concurrency was required, but I see the
commit message for 00d1e02be mentions:

Even single threaded
COPY is measurably faster, primarily due to not dirtying pages while
extending, if supported by the operating system (see commit 4d330a61bb1).

If that's the case then maybe the beta release notes could be edited
slightly to reflect this. Maybe something like:

"Relation extensions have been improved allowing faster bulk loading
of data using COPY. These improvements are more significant when
multiple processes are concurrently loading data into the same table."

The current text of "PostgreSQL 16 can also improve the performance of
concurrent bulk loading of data using COPY up to 300%." does lead me
to believe that nothing has been done to improve things when only a
single backend is involved.

Typically once a release announcement is out, we'll only edit it if it's inaccurate. I don't think the statement in the release announcement is inaccurate, as it specifies that concurrent bulk loading is faster.

I had based the description on what Andres described in the original discussion and through reading[1], which showed a "measurable" improvement as the commit message said, but it was not to the same degree as concurrently loading. It does still seem impactful -- the results show up to 20% improvement on a single backend -- but the bigger story was around the concurrency.

I'm -0.5 for revising the announcement, but I also don't want people to miss out on testing this. I'd be OK with this:

"PostgreSQL 16 can also improve the performance of bulk loading of data, with some tests showing using up to 300% improvement when concurrently executing `COPY` commands."

Thanks,

Jonathan

[1] https://www.postgresql.org/message-id/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to