On Tue, Jun 3, 2025 at 1:58 AM Dimitrios Apostolou <ji...@gmx.net> wrote: > This sounds like the best solution IMO. People can then experiment with > different settings and filesystems, and that way we also learn in the > process. Thank you for the effort and patches so far.
OK, here's a basic patch to experiment with. You can set: file_extend_method = fallocate,ftruncate,write file_extend_method_threshold = 8 # (below 8 always write, 0 means never write) To really make COPY fly we also need to get write combining and AIO going (we've had this working with various prototypes, but it all missed the boat for v18 which can only do that stuff for reads). Then you'll have concurrent 128kB or up to 1MB writes trundling along in the background which I guess should work pretty nicely for stuff like BTRFS/ZFS and compression and all that jazz.
From 8513b2ec3d31cb5afed9ffc1952326905fd90732 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Sat, 31 May 2025 22:50:22 +1200 Subject: [PATCH] Add file_extend_method setting. BTRFS's compression is reported to be disabled by posix_fallocate(), so offer a way to turn it off by setting it to either write or ftruncate instead. May also be useful for Windows, which lacks fallocate but is known to allocate space on ftruncate. The previous coding had a threshold of 8 blocks before using a bulk-extension system call instead of writing zeroes, so also make that configurable, as file_extend_method_threshold. 0 means never, and other numbers specify a threshold in blocks, defaulting to 8 as before. XXX WIP Reported-by: Dimitrios Apostolou <ji...@gmx.net> Discussion: https://postgr.es/m/b1843124-fd22-e279-a31f-252dffb6fbf2%40gmx.net --- src/backend/storage/file/fd.c | 6 ++++ src/backend/storage/smgr/md.c | 27 ++++++++++------ src/backend/utils/misc/guc_tables.c | 31 +++++++++++++++++++ src/backend/utils/misc/postgresql.conf.sample | 2 ++ src/include/storage/fd.h | 13 ++++++++ 5 files changed, 70 insertions(+), 9 deletions(-) diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 0e8299dd556..046e285e84f 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -164,6 +164,12 @@ bool data_sync_retry = false; /* How SyncDataDirectory() should do its job. */ int recovery_init_sync_method = DATA_DIR_SYNC_METHOD_FSYNC; +/* How data files should be bulk-extended with zeroes. */ +int file_extend_method = DEFAULT_FILE_EXTEND_METHOD; + +/* At what size file_extend_method is used instead of plain write. */ +int file_extend_method_threshold = 8; + /* Which kinds of files should be opened with PG_O_DIRECT. */ int io_direct_flags; diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 2ccb0faceb5..e1de6a26a67 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -588,23 +588,32 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum, * to allocate page cache space for the extended pages. * * However, we don't use FileFallocate() for small extensions, as it - * defeats delayed allocation on some filesystems. Not clear where - * that decision should be made though? For now just use a cutoff of - * 8, anything between 4 and 8 worked OK in some local testing. + * defeats delayed allocation on some filesystems. */ - if (numblocks > 8) + if (file_extend_method_threshold > 0 && + numblocks >= file_extend_method_threshold && + file_extend_method != FILE_EXTEND_METHOD_WRITE) { int ret; - ret = FileFallocate(v->mdfd_vfd, - seekpos, (off_t) BLCKSZ * numblocks, - WAIT_EVENT_DATA_FILE_EXTEND); + if (file_extend_method == FILE_EXTEND_METHOD_FTRUNCATE) + ret = FileTruncate(v->mdfd_vfd, + seekpos + (off_t) BLCKSZ * numblocks, + WAIT_EVENT_DATA_FILE_EXTEND); +#ifdef FILE_EXTEND_METHOD_FALLOCATE + else + ret = FileFallocate(v->mdfd_vfd, + seekpos, (off_t) BLCKSZ * numblocks, + WAIT_EVENT_DATA_FILE_EXTEND); +#endif if (ret != 0) { ereport(ERROR, errcode_for_file_access(), - errmsg("could not extend file \"%s\" with FileFallocate(): %m", - FilePathName(v->mdfd_vfd)), + errmsg("could not extend file \"%s\" with %s(): %m", + FilePathName(v->mdfd_vfd), + file_extend_method == FILE_EXTEND_METHOD_FTRUNCATE ? + "FileTruncate" : "FileFallocate"), errhint("Check free disk space.")); } } diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index f04bfedb2fd..3d779d3f4dc 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -491,6 +491,15 @@ static const struct config_enum_entry file_copy_method_options[] = { {NULL, 0, false} }; +static const struct config_enum_entry file_extend_method_options[] = { + {"write", FILE_EXTEND_METHOD_WRITE, false}, + {"ftruncate", FILE_EXTEND_METHOD_FTRUNCATE, false}, +#ifdef FILE_EXTEND_METHOD_FALLOCATE + {"fallocate", FILE_EXTEND_METHOD_FALLOCATE, false}, +#endif + {NULL, 0, false} +}; + /* * Options for enum values stored in other modules */ @@ -3265,6 +3274,18 @@ struct config_int ConfigureNamesInt[] = NULL }, + { + {"file_extend_method_threshold", + PGC_USERSET, + RESOURCES_DISK, + gettext_noop("Threshold for using methods other than write when extending data files."), + NULL, + GUC_UNIT_BLOCKS + }, + &file_extend_method_threshold, + 8, 0, INT_MAX + }, + { {"io_max_combine_limit", PGC_POSTMASTER, @@ -5264,6 +5285,16 @@ struct config_enum ConfigureNamesEnum[] = NULL, NULL, NULL }, + { + {"file_extend_method", PGC_USERSET, RESOURCES_DISK, + gettext_noop("Selects the method used for extending data files."), + NULL + }, + &file_extend_method, + DEFAULT_FILE_EXTEND_METHOD, file_extend_method_options, + NULL, NULL, NULL + }, + { {"wal_sync_method", PGC_SIGHUP, WAL_SETTINGS, gettext_noop("Selects the method used for forcing WAL updates to disk."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 341f88adc87..4dbad4400c8 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -179,6 +179,8 @@ # in kilobytes, or -1 for no limit #file_copy_method = copy # copy, clone (if supported by OS) +#file_extend_method = fallocate # fallocate, ftruncate, write +#file_extend_method_threshold = 8 # min to prefer selected method, 0 = never #max_notify_queue_pages = 1048576 # limits the number of SLRU pages allocated # for NOTIFY / LISTEN queue diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index b77d8e5e30e..25a39e6d539 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -55,11 +55,24 @@ typedef int File; #define IO_DIRECT_WAL 0x02 #define IO_DIRECT_WAL_INIT 0x04 +#define FILE_EXTEND_METHOD_WRITE 1 +#define FILE_EXTEND_METHOD_FTRUNCATE 2 +#ifdef HAVE_POSIX_FALLOCATE +#define FILE_EXTEND_METHOD_FALLOCATE 3 +#endif + +#ifdef FILE_EXTEND_METHOD_FALLOCATE +#define DEFAULT_FILE_EXTEND_METHOD FILE_EXTEND_METHOD_FALLOCATE +#else +#define DEFAULT_FILE_EXTEND_METHOD FILE_EXTEND_METHOD_WRITE +#endif /* GUC parameter */ extern PGDLLIMPORT int max_files_per_process; extern PGDLLIMPORT bool data_sync_retry; extern PGDLLIMPORT int recovery_init_sync_method; +extern PGDLLIMPORT int file_extend_method_threshold; +extern PGDLLIMPORT int file_extend_method; extern PGDLLIMPORT int io_direct_flags; /* -- 2.47.2