Re: Error:could not extend file " with FileFallocate(): No space left on device

Aleksandr Fedorov Wed, 14 Jan 2026 03:48:43 -0800

Dear community,

Based on the analysis of logs collected from several incidents under OEL8.10 / 9.3, the most likely cause is local exhaustion of free space inan allocation group in the XFS filesystem.

Further investigation revealed that a similar issue is documented in theRed Hat knowledge base (https://access.redhat.com/solutions/7129010),describing ENOSPC errors from the fallocate() function in XFSfilesystems during PostgreSQL backup operations.Red Hat references the commithttps://github.com/torvalds/linux/commit/6773da870ab89123d1b513da63ed59e32a29cb77and

believes that this kernel fix may address the PostgreSQL issue.

After analyzing the change set from this commit, we identified thefollowing combination of conditions that can trigger the ENOSPC error:

1. Presence of delayed allocations (committed but not yet written to disk).

2. Insufficient free space in the allocation group to cover all pendingdelayed allocations.

Subsequent search of the PostgreSQL community knowledge base led to themessagehttps://www.postgresql.org/message-id/[email protected].


Important points to highlight from this message:

1. Since kernel versions 2.6.x, XFS has implemented dynamic speculativepreallocation.2. The term "dynamic" means the preallocation size is regulated byinternal heuristics.

3. These heuristics are based on file access patterns and history.

4. Additional space allocated during preallocation is intended toprevent file fragmentation.5. When a file extends, its data is written into extents that may bedistributed across one or more allocation groups.6. Delayed allocation writes allow merging multiple allocations intopreallocated space before writing to disk, reducing the number ofextents and thus file fragmentation.7. The logic for tracking additional space retains it as long as thereare in-memory references to the file — for example, in an activelyrunning PostgreSQL database.

8. The XFS filesystem itself considers this space as used.

9. The actual file size may exceed the 1GB limit (not to be confusedwith apparent size).

This is confirmed by information collected using the `du -h` command,which shows "actual" file sizes and helps to detect files larger than1GB at the time of command execution (some even up to 2GB but we knowthat maximum size is 1GB).There may have been more such files, but after the replica crash, filedescriptors were released, causing the "actual" size to return to normal.

The dynamic allocator can be disabled by specifying the `allocsize`mount option when mounting the XFS filesystem.


We would like to share additional observations to help resolve the issue.

We were able to reproduce the original problem in two ways: directly ona PostgreSQL replica, and using a C program.

The first method is a test script (please see the attachedREADME_test_pg.md) that uses the mount option `allocsize=$(1*1024*1024)`when mounting the disk where PGDATA is located.The pgbench_accounts table is generated using the pgbench tool, andmultiple copies of this table are created and populated in parallel.During the process of filling these small tables (each table is nolarger than 25 MB upon script completion), numerous delayedpreallocation events occur, consuming free disk space.The subsequent parallel INSERT statements then cause replica crashesbecause there is no contiguous free space left on the disk to extend thefile of the large table.

Here an example of availabled free space in mounted points after replicais crashed with ENOSPC error ( pgdata_main is related to primary serverand pgdata_repl is related to replica ):

Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/loop0     xfs   4.0G  4.0G   74M  99% /pgdata_main
/dev/loop3     xfs   4.0G  3.8G  280M  94% /pgdata_repl

You may observe that when the issue is reproduced and the replicacrashes, the available disk space on the replica side appears largerthan on the primary side.However, the ENOSPC error in the logs indicates that disk space wasexhausted — and this is indeed accurate: after the crash, all filedescriptors were released, and the space previously preallocate fileswas reclaimed by the filesystem. Monitoring of files size using "du -h"right before the moment of crash and some time ago after that is showingthat files sizes are decrease from 26 Mb to 25 Mb.

The issue does not occur when using the minimum possible value for theallocsize parameter, which is set to allocsize=$(4*1024).Testing various values of allocsize under a specific workload onPostgreSQL with synchronous physical replication shows:

+----------------------+----------------------+---------------------------------------------------------------------+

| allocsize setting | Thread model | Result |

+----------------------+----------------------+---------------------------------------------------------------------+

| 1M | single thread | No issues observed |

+----------------------+----------------------+---------------------------------------------------------------------+

| 1M | multiple threads | Replica failed: "couldnot extend file ... No space left on device" |

+----------------------+----------------------+---------------------------------------------------------------------+

| 1GB | multiple threads | Primary failed: "couldnot extend file ... No space left on device" |

+----------------------+----------------------+---------------------------------------------------------------------+

| 4KB | multiple threads | No failure occurred |

+----------------------+----------------------+---------------------------------------------------------------------+

Another method is C program ( please find README_test_c.md ) whichreproduces the ENOSPC error on kernel version5.15.0-101.103.2.1.el9uek.x86_64.The program first attempts to write 748 KB to a file and then allocatean additional 16 KB using posix_fallocate().If posix_fallocate() fails, it displays a corresponding message andretries the operation.

The second attempt succeeds, indicating that space was available.

However, the program does not fully reproduce the potential PostgreSQLscenario, key differences are:1. The program uses a single process with a single thread, whereas realsystems involve one process with multiple threads or multiple processesoperating on files.2. The program uses a fixed buffer size for the mounted filesystem'sjournal, whereas in production environments the buffer size is dynamic(allocated based on historical space usage, i.e., workload-dependent).3. The issue does not occur when there are multiple allocation groupsthat are completely empty.


In our practice, we identified two viable approaches:

1. As a permanent solution: Upgrade the UEK kernel.
   Note that the fix has not been backported to all UEK versions:
   - It is not present in UEK7 (5.15.x).

- It is present in UEK8 (6.12.x, available starting with OL 9.5)from kernel version 6.12.0-0.20.20 onwards.2. As a temporary solution: Use the allocsize parameter to disabledynamic speculative preallocation. However, since this does not fix the root cause, failures may stilloccur.


On 9/10/24 17:11, Pecsök Ján wrote:

Dear community,
After upgrade of Posgres from version 13.5 to 16.2 we experiencefollowing error:
could not extend file"pg_tblspc/16401/PG_16_202307071/17820/3968302971" withFileFallocate(): No space left on device
We cannot easily replicate problem. It happens at randomly every 1-2weeks of intensive query computation.
Was there some changes in space allocation from Posgres 13.5 toPosgres 16.2?
Database has  size 91TB and has 27TB more space available.

# Reproducing ENOSPC Error in PostgreSQL

This script reproduces an `ENOSPC` (Error: No Space Left on Device) condition in PostgreSQL by exploiting filesystem-level extent allocation behavior under high-concurrency workloads. The issue is triggered by a combination of:

*   Mount option `allocsize=1M` on the `$PGDATA` mount point
*   Creation of many small tables (preallocating filesystem extents)
*   Parallel bulk inserts into a single large table (fragmenting free space)

This mimics real-world scenarios such as data migration or bulk ETL operations, where filesystem fragmentation leads to allocation failures even when total free space appears sufficient.

---

## Prerequisites

*   PostgreSQL 16.1 with `pgbench` installed
*   **XFS** filesystem (recommended)
*   `$PGDATA` and WAL logs located on different mount points
*   Mount option: `allocsize=1M` (or higher) on the `$PGDATA` mount point  
    *(This forces larger preallocation units, increasing fragmentation risk)*
*   Sufficient disk space (≥ 50 GB recommended)
*   Linux environment with `psql`, `pgbench`, `xargs`, and `seq`

---

## Key Factors for Reproduction

| Factor                    | Recommended Value   | Purpose                                                         |
| :------------------------ | :------------------ | :-------------------------------------------------------------- |
| `allocsize` mount option  | 1M                  | Forces large preallocations, increasing fragmentation risk      |
| Number of small tables    | 100–200             | Consumes allocation groups/clusters                             |
| Parallel threads          | 50–150              | Increases concurrency and allocation contention                 |
| Total rows inserted       | 5M–10M              | Pushes insert size beyond available contiguous extents          |
| Filesystem                | XFS                 | Exhibits this behavior under high fragmentation                 |

---

## Environment Setup

Set up XFS filesystems on separate disks for PGDATA and PGWAL with appropriate mount options:

```bash
# Format PGDATA disk with separate journal device and 128 allocation groups
mkfs.xfs -f -d agcount=128 -l logdev=/dev/journal_disk,size=64m /dev/pgdata_disk

# Format PGWAL disk
mkfs.xfs -f -d agcount=16 /dev/pgwal_disk

# Create mount points
mkdir /pgdata
mkdir /pgwal

# Mount PGDATA with allocsize=1M
mount -t xfs -o logdev=/dev/journal_disk,allocsize=1048576 /dev/pgdata_disk /pgdata

# Mount PGWAL
mount -t xfs /dev/pgwal_disk /pgwal
```

Important configuration details:

* PGDATA filesystem: XFS with separate journal device, mounted with allocsize=1M option
* Allocation groups: 128 AGs for PGDATA to increase fragmentation potential
* Separate mount: PGWAL on different disk/filesystem to isolate WAL impact
* Disk sizing: PGDATA disk should have sufficient space (≥ 50GB recommended)
* For PostgreSQL configuration, ensure data_directory points to /pgdata and consider setting WAL directory to /pgwal.

---

## Reproduction Script

The following bash script reproduces the ENOSPC error.

```bash
# Step 1: create initial table which will be used for copying rows
echo "preparing data.."
pgbench -U postgres -h localhost -p 5432 -i -I t postgres

# Step 2: Insert baseline data
psql -U postgres -h localhost -p 5432 -c "INSERT INTO pgbench_accounts(aid,bid,abalance,filler) SELECT gs.i AS aid,NULL,0,substring(md5(random()::text),0,84) from generate_series(1, 200000) gs(i)"

# Step 3: create 128 small tables in parallel (preallocates extents across AGs)
for i in $(seq 1 128); do echo $i; done | xargs -r -P 12 -I $$ psql -U postgres -h localhost -p 5432 -c "create table pgbench_accounts$$ as select * from pgbench_accounts" > /dev/null

# Step 4: clean up initial schema
pgbench -U postgres -h localhost -p 5432 -i -I d postgres

# Step 5
echo "reproducing.."
export THREADS=100
export PARTS=100
export TOTAL=6000000
export RANGE=$((TOTAL/PARTS))

# Step 6: insert 6M rows in 100 parallel batches into pgbench_accounts1
for i in $(seq 1 $PARTS); do echo $i; done | xargs -r -P $THREADS -I $$ psql -U postgres -h localhost -p 5432 -c "INSERT INTO pgbench_accounts1(aid,bid,abalance,filler) SELECT ($$*$RANGE)::integer+gs.i AS aid,NULL,0,substring(md5(random()::text),0,84) from generate_series(1, $RANGE) gs(i)" > /dev/null

# Step 7: final insert to push past threshold
psql -U postgres -h localhost -p 5432 -c "INSERT INTO pgbench_accounts1(aid,bid,abalance,filler) SELECT gs.i AS aid,NULL,0,substring(md5(random()::text),0,84) from generate_series(1, 200000) gs(i)" > /dev/null
```

---

## Important Notes
1. Step3 leads to creation of 128 tables, this consumes many allocation groups (AGs) on XFS and produces a lot of delayed preallocation events.
2. Step6 causes real issue with message "FATAL: could not extend file "base/xxxxx/xxxxxxxxx.xxxxx" with FileFallocate(): No space left on device" due to prior fragmentation from small tables, the filesystem cannot find a large enough contiguous free region — even if total free space is high ( but not available due to keeping by opened files descriptors )
3. Step7 should complete successfully if the ENOSPC issue did NOT occur, so that is prooving that space is enough for last step.
4. After a crash or restart, space is reclaimed as file descriptors are released. This makes the issue appear intermittent — but the root cause is filesystem fragmentation due to speculative preallocation, not actual disk exhaustion.

# Reproducing ENOSPC Error in C Program

This C program reproduces an `ENOSPC` (Error: No Space Left on Device) condition by demonstrating how filesystem fragmentation can cause allocation failures even when sufficient free space exists. The issue occurs when:

*   Filesystem is mounted with `allocsize=1M`
*   Many small files preallocate space across allocation groups
*   A large file is extended, followed by a small extension attempt

The program shows that while the initial large write succeeds, a subsequent small `posix_fallocate()` call fails with ENOSPC on the first attempt but succeeds on retry, proving that free space exists but isn't contiguous.

---

## Prerequisites

*   Filesystem: **XFS** with separate disk for journal
*   Mount option: `allocsize=1M` on the target mount point
*   Sufficient disk space (≥ 5 GB recommended)
*   Single-threaded execution
*   Linux environment with XFS development tools
*   C compiler (gcc/clang)

---

## Key Factors for Reproduction

| Factor                    | Recommended value | Purpose                                                                 |
| :------------------------ | :---------------- | :---------------------------------------------------------------------- |
| `allocsize` mount option  | 1M                | Forces 1MB preallocation units, increasing fragmentation risk           |
| small files (31MB each)   | 128               | Consumes allocation groups, fragmenting free space                      |
| Large initial write       | 748KB             | Creates substantial file growth within fragmented space                 |
| Small fallocate attempt   | 16KB              | Tests allocation of small contiguous space in fragmented environment    |
| Immediate retry           | on failure        | Demonstrates space exists but wasn't contiguous on first attempt        |

---

## Environment Setup

Create an XFS filesystem with separate journal device and mount with `allocsize=1M`:

```bash
# Format disk (replace /dev/sdX with your data disk, /dev/sdY with journal disk)
mkfs.xfs -f -d agcount=128 -l logdev=/dev/sdY,size=64m /dev/sdX

# Mount with allocsize=1M
mkdir /mnt/test
mount -t xfs -o logdev=/dev/sdY,allocsize=1048576 /dev/sdX /mnt/test
```

---

## Preparing C program
```bash
cat > test.c << 'EOF'
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>

int main(int argc, char* argv[]) {
	const char *basedir = "mnt";
	char filename[256];
	char writebuf[1024];
	int inc1 = 748;
	int inc2 = 16;

	sprintf(filename, "%s/%03i.dat", basedir, 1);
	memset(writebuf, 1, 1024);

	printf("Opening file %s\n", filename);
	int fd = open(filename, O_RDWR);
	if (fd == -1) {
		printf("Error on open file %s (code=%i)\n", filename, errno);
		return 1;
	}
	off_t fs = lseek(fd, 0, SEEK_END);
	printf("Current file size: %li\n", fs);
	
	printf("Writing %i bytes at the file end\n", inc1 * 1024);
	for (int i = 0; i < inc1; i++) {
		int write_result = write(fd, writebuf, 1024);
		if (write_result == -1) {
			printf("Error on write (code=%i)\n", errno);
			close(fd);
			return 1;
		}
	}

	/* Test */
	int iteration = 1;
	int test_result = 0;
	do {
		printf("Allocate addtional %i bytes at the file end\n", inc2 * 1024);
		test_result = posix_fallocate(fd, fs + inc1 * 1024,  inc2 * 1024);
		if (test_result != 0) {
			if (test_result == ENOSPC) {
				printf("Error ENOSPC on posix_fallocate!\n");
				if (iteration++ < 2) {
					printf("Retrying operation...\n");
					continue;
				}
			}
			else
				printf("Error on posix_fallocate (code=%i)\n", test_result);
			close(fd);
			return 1;
		}
	} while ( test_result != 0 );
	printf("Done\n");
	close(fd);
	return 0;
}
EOF
echo "Compile test tool"
gcc -o test test.c
```

---


## Preparing data
```bash
# Create 128 files, each preallocating 31MB
for i in {000..127}; do
    fallocate -x -l 31M "/mnt/test/${i}.dat"
done
```

---


## Reproducing by C Program
```bash
test
# output:
#Writing 765952 bytes at the file end
#Allocate additional 16384 bytes at the file end
#Error ENOSPC on posix_fallocate!
#Retrying operation...
#Allocate additional 16384 bytes at the file end
#Done
```

---


## Important Notes
1. ENOSPC is intermittent: The first posix_fallocate() call fails with ENOSPC, but the identical retry succeeds immediately. This proves free space exists but wasn't contiguous on the first attempt.
2. Root cause: Filesystem fragmentation caused by:
* allocsize=1M forcing large preallocation units
* 128 small files consuming allocation groups
* Large file extension (748KB) fragmenting remaining space
3. Not a disk space issue: The retry success demonstrates sufficient free space exists. The failure is due to inability to find contiguous space for the small (16KB) allocation.

Re: Error:could not extend file " with FileFallocate(): No space left on device

Reply via email to