On Tue, 3 Jun 2025, Thomas Munro wrote:
On Mon, Jun 2, 2025 at 10:14 PM Dimitrios Apostolou <ji...@gmx.net> wrote:
On Sun, 1 Jun 2025, Thomas Munro wrote:
Or for a completely different approach: I wonder if ftruncate() would
be more efficient on COW systems anyway. The minimum thing we need is
for the file system to remember the new size, 'cause, erm, we don't.
All the rest is probably a waste of cycles, since they reserve real
space (or fail to) later in the checkpointer or whatever process
eventually writes the data out.
FWIW I asked the btrfs devs. From
https://github.com/kdave/btrfs-progs/pull/976
I quote Qu Wenruo:
Only for falloc(), not ftruncate().
The PREALLOC inode flag is added for any preallocated file extent,
meanwhile truncate only creates holes.
truncate is fast but it's really different from fallocate by there is
nothing really allocated.
This means the later writes will need to allocate their own data
extents. This is fine and even preferred for btrfs, but may lead to
performance drop for more traditional fses.
We're in an era that fs features are not longer that generic, fallocate
is just one example, in fact fallocate will cause more problems more
than no compression.
It's really a deep rabbit hole, and is not something simple true or
false questions.
In other words, btrfs will not try to allocate anything with ftruncate(),
it will just mark the new space as a "hole". As such, the file is not
marked as "PREALLOC" which is what disables compression. Of course there
is no guarantee that further writes will succeed, and as quoted above,
other (non-COW) filesystems might be slower writing the
ftruncate()-allocated space.
Yeah, right, I know. But PostgreSQL has at least two different goals
when extending a relation:
1. Remember the new size of the relation somewhere*.
2. Reserve space now, so that we can report ENOSPC and roll back the
transaction that wants to extend the relation when the disk is full,
instead of causing a checkpoint or buffer eviction to fail later (see
https://wiki.postgresql.org/wiki/ENOSPC for longer version).
But the second thing just can't work on a COW system by definition, so
the whole notion is bogus, which is why I wondered if fruncate() is
actually a reasonable option to have, even though it just creates
holes (on Unixen). I also know of another completely different reason
to want to use ftruncate(): NTFS, which *doesn't* create holes (NTFS
supports holes via other syscalls, but ftruncate() or rather
_chsize_s() as they spell it doesn't make them), making it more like
posix_fallocate() in this usage. So I was beginning to wonder if we
might want to experiment with a patch that adds
file_extend_method=fallocate,ftruncate,write. Perhaps accompanied by
a threshold setting below which it always writes.
This sounds like the best solution IMO. People can then experiment with
different settings and filesystems, and that way we also learn in the
process. Thank you for the effort and patches so far.
Dimitris