Hi, On 2024-12-09 15:47:55 +0100, Tomas Vondra wrote: > On 12/9/24 11:27, Jakub Wartak wrote: > > On Mon, Dec 9, 2024 at 10:19 AM Michael Harris <har...@gmail.com > > <mailto:har...@gmail.com>> wrote: > > > > Hi Michael, > > > > We found this thread describing similar issues: > > > > https://www.postgresql.org/message-id/flat/ > > > > AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com > > > > <https://www.postgresql.org/message-id/flat/AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com> > > > > > > We've got some case in the past here in EDB, where an OS vendor has > > blamed XFS AG fragmentation (too many AGs, and if one AG is not having > > enough space -> error). Could You perhaps show us output of on that LUN: > > 1. xfs_info > > 2. run that script from https://www.suse.com/support/kb/doc/? > > id=000018219 <https://www.suse.com/support/kb/doc/?id=000018219> for > > Your AG range > > > > But this can be reproduced on a brand new filesystem - I just tried > creating a 1GB image, create XFS on it, mount it, and fallocate a 600MB > file twice. Which that fails, and there can't be any real fragmentation.
If I understand correctly xfs, before even looking at the file's current layout, checks if there's enough free space for the fallocate() to succeed. Here's an explanation for why: https://www.spinics.net/lists/linux-xfs/msg55429.html The real problem with preallocation failing part way through due to overcommit of space is that we can't go back an undo the allocation(s) made by fallocate because when we get ENOSPC we have lost all the state of the previous allocations made. If fallocate is filling holes between unwritten extents already in the file, then we have no way of knowing where the holes we filled were and hence cannot reliably free the space we've allocated before ENOSPC was hit. I.e. reserving space as you go would leave you open to ending up with some, but not all, of those allocations having been made. Whereas pre-reserving the worst case space needed, ahead of time, ensures that you have enough space to go through it all. You can't just go through the file [range] and compute how much free space you will need allocate and then do the a second pass through the file, because the file layout might have changed concurrently... This issue seems independent of the issue Michael is having though. Postgres, afaik, won't fallocate huge ranges with already allocated space. Greetings, Andres Freund