Hello hackers, A couple of years ago, Oskari Saarenmaa proposed a patch[1] to adopt $SUBJECT. Last year, Tobias Oberstein argued again that we should do that[2]. At the end of that thread there was a +1 from multiple committers in support of getting it done for PostgreSQL 12. Since Oskari hasn't returned, I decided to dust off his patch and start a new thread.
Here is a rebase and an initial review. I plan to address the problems myself, unless Oskari wants to do that in which case I'll happily review and hopefully eventually commit it. It's possible I introduced new bugs while rebasing since basically everything moved around, but it passes check-world here with and without HAVE_PREAD and HAVE_PWRITE defined. Let me summarise what's going on in the patch: 1. FileRead() and FileWrite() are replaced with FileReadAt() and FileWriteAt(), taking a new offset argument. Now we can do one syscall instead of two for random reads and writes. 2. fd.c no longer tracks seek position, so we lose a bunch of cruft from the vfd machinery. FileSeek() now only supports SEEK_SET and SEEK_END. This is taking the position that we no longer want to be able to do file IO with implicit positioning at all. I think that seems reasonable: it's nice to drop all that position tracking and caching code, and the seek-based approach would be incompatible with any multithreading plans. I'm not even sure I'd bother adding "At" to the function names. If there are any extensions that want the old interface they will fail to compile either way. Note that the BufFile interface remains the same, so hopefully that covers many use cases. I guess the only remaining reason to use FileSeek() is to get the file size? So I wonder why SEEK_SET remains valid in the patch... if my suspicion is correct that only SEEK_END still has a reason to exist, perhaps we should just kill FileSeek() and add FileSize() or something instead? pgstat_report_wait_start(wait_event_info); +#ifdef HAVE_PREAD + returnCode = pread(vfdP->fd, buffer, amount, offset); +#else + lseek(VfdCache[file].fd, offset, SEEK_SET); returnCode = read(vfdP->fd, buffer, amount); +#endif pgstat_report_wait_end(); This obviously lacks error handling for lseek(). I wonder if anyone would want separate wait_event tracking for the lseek() (though this codepath would be used by almost nobody, especially if someone adds Windows support, so it's probably not worth bothering with). I suppose we could assume that if you have pread() you must also have pwrite() and save on ./configure cycles. I will add this to the next Festival of Commits, though clearly it needs a bit more work before the festivities begin. [1] https://www.postgresql.org/message-id/flat/a86bd200-ebbe-d829-e3ca-0c4474b2fcb7%40ohmu.fi [2] https://www.postgresql.org/message-id/flat/b8748d39-0b19-0514-a1b9-4e5a28e6a208%40gmail.com -- Thomas Munro http://www.enterprisedb.com
0001-Use-pread-pwrite-instead-of-lseek-read-write-v1.patch
Description: Binary data