On Mon, Feb 13, 2023 at 11:09 PM Andres Freund <and...@anarazel.de> wrote: > > > Later code assigns iov[0].iov_len thus we need to provide a separate > > iov non-const variable, or can we use pwrite instead there? (I didn't > > find pg_pwrite_with_retry(), though) > > Given that we need to do that, and given that we already need to loop to > handle writes that are longer than PG_IOV_MAX * BLCKSZ, it's probably not > worth avoiding iov initialization. > > But I think it's worth limiting the initialization to blocks.
We can still optimize away the for loop by using a single iovec for remaining size, like the attached v2 patch. > I'd also try to combine the first pg_writev_* with the second one. Done, PSA v2 patch. On Tue, Feb 14, 2023 at 6:40 AM Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2023-02-12 09:31:36 -0800, Andres Freund wrote: > > Another thing: I think we should either avoid iterating over all the IOVs if > > we don't need them, or, even better, initialize the array as a constant, > > once. > > I just tried to use pg_pwrite_zeros - and couldn't because it doesn't have an > offset parameter. Huh, what lead to the function being so constrained? Done, PSA v2 patch. We could do few more things, but honestly I feel they're unnecessary: 1) An assert-only code that checks if the asked file contents are zeroed at the end of pg_pwrite_zeros (to be more defensive, but reading 16MB files and checking if it's zero-filled will surely slowdown the Assert builds). 2) A small test module passing in a file with the size to write isn't multiple of block size, meaning, the code we have in the function to write last remaining bytes (less than BLCKSZ) gets covered which isn't covered right now - https://coverage.postgresql.org/src/common/file_utils.c.gcov.html. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
From 4fd58060fa3fc301dff2b5fb38c105ad91bc8be4 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Date: Tue, 14 Feb 2023 10:12:34 +0000 Subject: [PATCH v2] Refactor pg_pwrite_zeros() This commit changes pg_pwrite_zeros() as follows: 1. Optimize zero buffer handling in pg_pwrite_zeros() by using static const zero-filled buffer to avoid memset-ing on every call. 2. Optimize zero buffer handling in pg_pwrite_zeros() by using static iovec and point the zero-filled buffer for the first time through to avoid for loop initializing iovec on every call. 3. Add offset parameter to pg_pwrite_zeros() to let callers specify the file position from where to zero-fill. 4. Combine the logic writing the last remaining bytes (less than one full block) into main for loop to avoid code duplication. Suggested-by: Andres Freund Author: Bharath Rupireddy Discussion: https://www.postgresql.org/message-id/20230211224424.r25uw4rsv6taukxk%40awork3.anarazel.d --- src/backend/access/transam/xlog.c | 2 +- src/bin/pg_basebackup/walmethods.c | 2 +- src/common/file_utils.c | 125 ++++++++++++++++++----------- src/include/common/file_utils.h | 2 +- 4 files changed, 82 insertions(+), 49 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index f9f0f6db8d..786b26054c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2981,7 +2981,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli, * indirect blocks are down on disk. Therefore, fdatasync(2) or * O_DSYNC will be sufficient to sync future writes to the log file. */ - rc = pg_pwrite_zeros(fd, wal_segment_size); + rc = pg_pwrite_zeros(fd, wal_segment_size, 0); if (rc < 0) save_errno = errno; diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c index 54014e2b84..6d14b988cb 100644 --- a/src/bin/pg_basebackup/walmethods.c +++ b/src/bin/pg_basebackup/walmethods.c @@ -222,7 +222,7 @@ dir_open_for_write(WalWriteMethod *wwmethod, const char *pathname, { ssize_t rc; - rc = pg_pwrite_zeros(fd, pad_to_size); + rc = pg_pwrite_zeros(fd, pad_to_size, 0); if (rc < 0) { diff --git a/src/common/file_utils.c b/src/common/file_utils.c index 4dae534152..7faa7a72f6 100644 --- a/src/common/file_utils.c +++ b/src/common/file_utils.c @@ -531,72 +531,105 @@ pg_pwritev_with_retry(int fd, const struct iovec *iov, int iovcnt, off_t offset) /* * pg_pwrite_zeros * - * Writes zeros to file worth "size" bytes, using vectored I/O. + * Writes zeros to file worth "size" bytes from starting file offset 'offset', + * using vectored I/O. * * Returns the total amount of data written. On failure, a negative value * is returned with errno set. + * + * Note that pg_pwrite(), which gets called from this function, may change the + * file position (see win32pwrite.c). If needed, callers must deal with it. */ ssize_t -pg_pwrite_zeros(int fd, size_t size) +pg_pwrite_zeros(int fd, size_t size, off_t offset) { - PGAlignedBlock zbuffer; /* worth BLCKSZ */ - size_t zbuffer_sz; - struct iovec iov[PG_IOV_MAX]; - int blocks; - size_t remaining_size = 0; - int i; - ssize_t written; - ssize_t total_written = 0; - - zbuffer_sz = sizeof(zbuffer.data); - - /* Zero-fill the buffer. */ - memset(zbuffer.data, 0, zbuffer_sz); - - /* Prepare to write out a lot of copies of our zero buffer at once. */ - for (i = 0; i < lengthof(iov); ++i) +#define WritingLastBytes(last, remaining) (last != 0 && last == remaining) + + /* + * Zero-filled buffer worth BLCKSZ. We use static variable to avoid zeroing + * the buffer on every call. Also, with the const qualifier, it is ensured + * that the buffer is placed in read-only section of the process's memory. + */ + static const PGAlignedBlock zbuffer = {0}; + static size_t zbuffer_sz = BLCKSZ; + static bool first_time = true; + static struct iovec iov[PG_IOV_MAX] = {0}; + int blocks; + int i; + ssize_t nwritten; + ssize_t ntotalwritten = 0; + size_t nremaining; + size_t nlast; + + /* + * If first time through, point the zero buffer to iovec structure. This + * avoids for loop on every call. + */ + if (first_time) { - iov[i].iov_base = zbuffer.data; - iov[i].iov_len = zbuffer_sz; + /* Prepare to write out a lot of copies of our zero buffer at once. */ + for (i = 0; i < lengthof(iov); ++i) + { + iov[i].iov_base = (void *) (unconstify(PGAlignedBlock *, &zbuffer)->data); + iov[i].iov_len = zbuffer_sz; + } + + first_time = false; } - /* Loop, writing as many blocks as we can for each system call. */ blocks = size / zbuffer_sz; - remaining_size = size % zbuffer_sz; - for (i = 0; i < blocks;) - { - int iovcnt = Min(blocks - i, lengthof(iov)); - off_t offset = i * zbuffer_sz; + nremaining = size; + nlast = size % zbuffer_sz; + i = 0; - written = pg_pwritev_with_retry(fd, iov, iovcnt, offset); + /* Loop, writing as many blocks as we can for each system call. */ + while (nremaining > 0) + { + int iovcnt; - if (written < 0) - return written; + /* Determine if we need to write last bytes (< zbuffer_sz) */ + if (WritingLastBytes(nlast, nremaining)) + { + /* Write last bytes in this cycle */ + Assert(nlast < zbuffer_sz); - i += iovcnt; - total_written += written; - } + iov[0].iov_len = nlast; - /* Now, write the remaining size, if any, of the file with zeros. */ - if (remaining_size > 0) - { - /* We'll never write more than one block here */ - int iovcnt = 1; + /* We will write not more than one block */ + iovcnt = 1; + } + else + iovcnt = Min(blocks - i, lengthof(iov)); - /* Jump on to the end of previously written blocks */ - off_t offset = i * zbuffer_sz; + nwritten = pg_pwritev_with_retry(fd, iov, iovcnt, offset); - iov[0].iov_len = remaining_size; + /* Let's make the iovec that's used for writing last bytes reusable */ + if (WritingLastBytes(nlast, nremaining)) + { + Assert(iov[0].iov_len == nlast); + iov[0].iov_len = zbuffer_sz; + } - written = pg_pwritev_with_retry(fd, iov, iovcnt, offset); + if (nwritten < 0) + return nwritten; - if (written < 0) - return written; + i += iovcnt; + ntotalwritten += nwritten; + nremaining -= nwritten; + offset += nwritten; - total_written += written; +#ifdef USE_ASSERT_CHECKING + /* + * If we've written last bytes, it means that we're done writing all + * bytes and going to exit. + */ + if (WritingLastBytes(nlast, nremaining)) + Assert(nremaining == 0); +#endif } + Assert(ntotalwritten == size); - Assert(total_written == size); + return ntotalwritten; - return total_written; +#undef WritingLastBytes } diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h index bda6d3a541..b7efa1226d 100644 --- a/src/include/common/file_utils.h +++ b/src/include/common/file_utils.h @@ -44,6 +44,6 @@ extern ssize_t pg_pwritev_with_retry(int fd, int iovcnt, off_t offset); -extern ssize_t pg_pwrite_zeros(int fd, size_t size); +extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset); #endif /* FILE_UTILS_H */ -- 2.34.1