Thanks, I learned something else: I didn't know Hetzner offered S3 compatible storage.
The interesting thing is, a few searches about the performance return mostly negative impressions about their object storage in comparison to the original S3. Finding out what kind of performance your benchmarks would yield on a pure AWS setting would be interesting. I am not asking you to do that, but you may get even better performance in that case :) Cheers, Seref On Fri, Jul 18, 2025 at 11:58 AM Pierre Barre <pie...@barre.sh> wrote: > Hi Seref, > > For the benchmarks, I used Hetzner's cloud service with the following > setup: > > - A Hetzner s3 bucket in the FSN1 region > - A virtual machine of type ccx63 48 vCPU 192 GB memory > - 3 ZeroFS nbd devices (same s3 bucket) > - A ZFS stripped pool with the 3 devices > - 200GB zfs L2ARC > - Postgres configured accordingly memory-wise as well as with > synchronous_commit = off, wal_init_zero = off and wal_recycle = off. > > Best, > Pierre > > On Fri, Jul 18, 2025, at 12:42, Seref Arikan wrote: > > Sorry, this was meant to go to the whole group: > > Very interesting!. Great work. Can you clarify how exactly you're running > postgres in your tests? A specific AWS service? What's the test > infrastructure that sits above the file system? > > On Thu, Jul 17, 2025 at 11:59 PM Pierre Barre <pie...@barre.sh> wrote: > > Hi everyone, > > I wanted to share a project I've been working on that enables PostgreSQL > to run on S3 storage while maintaining performance comparable to local > NVMe. The approach uses block-level access rather than trying to map > filesystem operations to S3 objects. > > ZeroFS: https://github.com/Barre/ZeroFS > > # The Architecture > > ZeroFS provides NBD (Network Block Device) servers that expose S3 storage > as raw block devices. PostgreSQL runs unmodified on ZFS pools built on > these block devices: > > PostgreSQL -> ZFS -> NBD -> ZeroFS -> S3 > > By providing block-level access and leveraging ZFS's caching capabilities > (L2ARC), we can achieve microsecond latencies despite the underlying > storage being in S3. > > ## Performance Results > > Here are pgbench results from PostgreSQL running on this setup: > > ### Read/Write Workload > > ``` > postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 example > pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1)) > starting vacuum...end. > transaction type: <builtin: TPC-B (sort of)> > scaling factor: 50 > query mode: simple > number of clients: 50 > number of threads: 15 > maximum number of tries: 1 > number of transactions per client: 100000 > number of transactions actually processed: 5000000/5000000 > number of failed transactions: 0 (0.000%) > latency average = 0.943 ms > initial connection time = 48.043 ms > tps = 53041.006947 (without initial connection time) > ``` > > ### Read-Only Workload > > ``` > postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 -S > example > pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1)) > starting vacuum...end. > transaction type: <builtin: select only> > scaling factor: 50 > query mode: simple > number of clients: 50 > number of threads: 15 > maximum number of tries: 1 > number of transactions per client: 100000 > number of transactions actually processed: 5000000/5000000 > number of failed transactions: 0 (0.000%) > latency average = 0.121 ms > initial connection time = 53.358 ms > tps = 413436.248089 (without initial connection time) > ``` > > These numbers are with 50 concurrent clients and the actual data stored in > S3. Hot data is served from ZFS L2ARC and ZeroFS's memory caches, while > cold data comes from S3. > > ## How It Works > > 1. ZeroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can > use like any other block device > 2. Multiple cache layers hide S3 latency: > a. ZFS ARC/L2ARC for frequently accessed blocks > b. ZeroFS memory cache for metadata and hot dataZeroFS exposes NBD > devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can use like any other block > device > c. Optional local disk cache > 3. All data is encrypted (ChaCha20-Poly1305) before hitting S3 > 4. Files are split into 128KB chunks for insertion into ZeroFS' LSM-tree > > ## Geo-Distributed PostgreSQL > > Since each region can run its own ZeroFS instance, you can create > geographically distributed PostgreSQL setups. > > Example architectures: > > Architecture 1 > > > PostgreSQL Client > | > | SQL queries > | > +--------------+ > | PG Proxy | > | (HAProxy/ | > | PgBouncer) | > +--------------+ > / \ > / \ > Synchronous Synchronous > Replication Replication > / \ > / \ > +---------------+ +---------------+ > | PostgreSQL 1 | | PostgreSQL 2 | > | (Primary) |◄------►| (Standby) | > +---------------+ +---------------+ > | | > | POSIX filesystem ops | > | | > +---------------+ +---------------+ > | ZFS Pool 1 | | ZFS Pool 2 | > | (3-way mirror)| | (3-way mirror)| > +---------------+ +---------------+ > / | \ / | \ > / | \ / | \ > NBD:10809 NBD:10810 NBD:10811 NBD:10812 NBD:10813 NBD:10814 > | | | | | | > +--------++--------++--------++--------++--------++--------+ > |ZeroFS 1||ZeroFS 2||ZeroFS 3||ZeroFS 4||ZeroFS 5||ZeroFS 6| > +--------++--------++--------++--------++--------++--------+ > | | | | | | > | | | | | | > S3-Region1 S3-Region2 S3-Region3 S3-Region4 S3-Region5 S3-Region6 > (us-east) (eu-west) (ap-south) (us-west) (eu-north) (ap-east) > > Architecture 2: > > PostgreSQL Primary (Region 1) ←→ PostgreSQL Standby (Region 2) > \ / > \ / > Same ZFS Pool (NBD) > | > 6 Global ZeroFS > | > S3 Regions > > > The main advantages I see are: > 1. Dramatic cost reduction for large datasets > 2. Simplified geo-distribution > 3. Infinite storage capacity > 4. Built-in encryption and compression > > Looking forward to your feedback and questions! > > Best, > Pierre > > P.S. The full project includes a custom NFS filesystem too. > > >