On Tue, Mar 20, 2018 at 10:55:04AM -0400, Peter Eisentraut wrote: > On 3/19/18 22:58, Michael Paquier wrote: >> - Extend copy_file so as it is able to use fcopyfile. > > fcopyfile() does not support cloning. (This is not documented.)
You are right. I have been reading the documentation here to get an idea as I don't have a macos system at hand: https://www.unix.com/man-page/osx/3/fcopyfile/ However I have bumped into that: http://www.openradar.me/30706426 Future versions will be visibly fixed. >> - Move the work done in pg_upgrade into a common API which can as well >> be used by pg_rewind as well. One place would be to have a >> frontend-only API in src/common which does the leg work. I would >> recommend working only on file descriptors as well for consistency with >> copy_file_range. > > pg_upgrade copies files, whereas pg_rewind needs to copy file ranges. > So I don't think this is going to be a good match. > > We could add support for using Linux copy_file_range() in pg_rewind, but > that would work a bit differently. I also don't have a good sense of > how to test the performance of that. One simple way to test that would be to limit the time it takes to scan the WAL segments on the target so as the filemap is computed quickly, and create many, say gigabyte-size relations on the promoted source which will need to be copied from the source to the target. > Another thing to think about is that we go through some trouble to > initialize new WAL files so that the disk space is fully allocated. If > we used file cloning calls in pg_rewind, that would potentially > invalidate some of that. At least, we'd have to think through this more > carefully. Agreed. Let's keep in mind such things but come with a sane, first cut of this patch based on the time remaining in this commit fest. >> - Add proper wait events for the backend calls. Those are missing for >> copy_file_range and copyfile. > > done + <entry><literal>CopyFileCopy</literal></entry> + <entry>Waiting for a file copy operation (if the copying is done by + an operating system call rather than as separate read and write + operations).</entry> CopyFileCopy is... Redundant. Perhaps CopyFileSystem or CopyFileRange? >> - For XLogFileCopy, the problem may be trickier as the tail of a segment >> is filled with zeroes, so dropping it from the first version of the >> patch sounds wiser. > > Seems like a possible follow-on project. But see also under pg_rewind > above. No objections to do that in the future for both. > Another oddity is that pg_upgrade uses CopyFile() on Windows, but the > backend does not. The Git log shows that the backend used to use > CopyFile(), but that was then removed when the generic code was added, > but when pg_upgrade was imported, it came with the CopyFile() call. You mean 558730ac, right? > I suspect the CopyFile() call can be quite a bit faster, so we should > consider adding it back in. Or if not, remove it from pg_upgrade. Hm. The proposed patch also removes an important property of what happens now in copy_file: the copied files are periodically synced to avoid spamming the cache, so for some loads wouldn't this cause a performance regression? At least on Linux it is possible to rely on sync_file_range which is called via pg_flush_data, so it seems to me that we ought to roughly keep the loop working on FLUSH_DISTANCE, and replace the calls of read/write by copy_file_range. copyfile is only able to do a complete file copy, so we would also lose this property as well on Linux. Even for Windows using CopyFile would be a step backwards for the backend. pg_upgrade is different though as it copies files fully, so using both copyfile and copy_file_range makes sense. At the end, it seems to me that using copy_file_range has some values as you save a set of read/write calls, but copyfile comes with its limitations, which I think will cause side issues, so I would recommend dropping it from a first cut of the patch for the backend. -- Michael
signature.asc
Description: PGP signature