> The interesting thing is, a few searches about the performance return mostly > negative impressions about their object storage in comparison to the original > S3. I think they had a rough start, but it's quite good now from what I've experienced. It's also dirt-cheap, and they don't bill for operations. So if you run ZeroFS on that you only pay for raw storage at €4.99 a month.
Combine that with their dirt cheap dedicated servers, https://www.hetzner.com/dedicated-rootserver/matrix-ax/ you can have a <€50 a month multi-terabytes postgres database I'm dreaming of running https://www.merklemap.com/ on such a setup, but it's too early yet :) > Finding out what kind of performance your benchmarks would yield on a pure > AWS setting would be interesting. I am not asking you to do that, but you may > get even better performance in that case :) Yes, I need to try that! Best, Pierre On Fri, Jul 18, 2025, at 14:55, Seref Arikan wrote: > Thanks, I learned something else: I didn't know Hetzner offered S3 compatible > storage. > > The interesting thing is, a few searches about the performance return mostly > negative impressions about their object storage in comparison to the original > S3. > > Finding out what kind of performance your benchmarks would yield on a pure > AWS setting would be interesting. I am not asking you to do that, but you may > get even better performance in that case :) > > Cheers, > Seref > > > On Fri, Jul 18, 2025 at 11:58 AM Pierre Barre <pie...@barre.sh> wrote: >> __ >> Hi Seref, >> >> For the benchmarks, I used Hetzner's cloud service with the following setup: >> >> - A Hetzner s3 bucket in the FSN1 region >> - A virtual machine of type ccx63 48 vCPU 192 GB memory >> - 3 ZeroFS nbd devices (same s3 bucket) >> - A ZFS stripped pool with the 3 devices >> - 200GB zfs L2ARC >> - Postgres configured accordingly memory-wise as well as with >> synchronous_commit = off, wal_init_zero = off and wal_recycle = off. >> >> Best, >> Pierre >> >> On Fri, Jul 18, 2025, at 12:42, Seref Arikan wrote: >>> Sorry, this was meant to go to the whole group: >>> >>> Very interesting!. Great work. Can you clarify how exactly you're running >>> postgres in your tests? A specific AWS service? What's the test >>> infrastructure that sits above the file system? >>> >>> On Thu, Jul 17, 2025 at 11:59 PM Pierre Barre <pie...@barre.sh> wrote: >>>> Hi everyone, >>>> >>>> I wanted to share a project I've been working on that enables PostgreSQL >>>> to run on S3 storage while maintaining performance comparable to local >>>> NVMe. The approach uses block-level access rather than trying to map >>>> filesystem operations to S3 objects. >>>> >>>> ZeroFS: https://github.com/Barre/ZeroFS >>>> >>>> # The Architecture >>>> >>>> ZeroFS provides NBD (Network Block Device) servers that expose S3 storage >>>> as raw block devices. PostgreSQL runs unmodified on ZFS pools built on >>>> these block devices: >>>> >>>> PostgreSQL -> ZFS -> NBD -> ZeroFS -> S3 >>>> >>>> By providing block-level access and leveraging ZFS's caching capabilities >>>> (L2ARC), we can achieve microsecond latencies despite the underlying >>>> storage being in S3. >>>> >>>> ## Performance Results >>>> >>>> Here are pgbench results from PostgreSQL running on this setup: >>>> >>>> ### Read/Write Workload >>>> >>>> ``` >>>> postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 example >>>> pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1)) >>>> starting vacuum...end. >>>> transaction type: <builtin: TPC-B (sort of)> >>>> scaling factor: 50 >>>> query mode: simple >>>> number of clients: 50 >>>> number of threads: 15 >>>> maximum number of tries: 1 >>>> number of transactions per client: 100000 >>>> number of transactions actually processed: 5000000/5000000 >>>> number of failed transactions: 0 (0.000%) >>>> latency average = 0.943 ms >>>> initial connection time = 48.043 ms >>>> tps = 53041.006947 (without initial connection time) >>>> ``` >>>> >>>> ### Read-Only Workload >>>> >>>> ``` >>>> postgres@ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 -S example >>>> pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1)) >>>> starting vacuum...end. >>>> transaction type: <builtin: select only> >>>> scaling factor: 50 >>>> query mode: simple >>>> number of clients: 50 >>>> number of threads: 15 >>>> maximum number of tries: 1 >>>> number of transactions per client: 100000 >>>> number of transactions actually processed: 5000000/5000000 >>>> number of failed transactions: 0 (0.000%) >>>> latency average = 0.121 ms >>>> initial connection time = 53.358 ms >>>> tps = 413436.248089 (without initial connection time) >>>> ``` >>>> >>>> These numbers are with 50 concurrent clients and the actual data stored in >>>> S3. Hot data is served from ZFS L2ARC and ZeroFS's memory caches, while >>>> cold data comes from S3. >>>> >>>> ## How It Works >>>> >>>> 1. ZeroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can >>>> use like any other block device >>>> 2. Multiple cache layers hide S3 latency: >>>> a. ZFS ARC/L2ARC for frequently accessed blocks >>>> b. ZeroFS memory cache for metadata and hot dataZeroFS exposes NBD >>>> devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can use like any other block >>>> device >>>> c. Optional local disk cache >>>> 3. All data is encrypted (ChaCha20-Poly1305) before hitting S3 >>>> 4. Files are split into 128KB chunks for insertion into ZeroFS' LSM-tree >>>> >>>> ## Geo-Distributed PostgreSQL >>>> >>>> Since each region can run its own ZeroFS instance, you can create >>>> geographically distributed PostgreSQL setups. >>>> >>>> Example architectures: >>>> >>>> Architecture 1 >>>> >>>> >>>> PostgreSQL Client >>>> | >>>> | SQL queries >>>> | >>>> +--------------+ >>>> | PG Proxy | >>>> | (HAProxy/ | >>>> | PgBouncer) | >>>> +--------------+ >>>> / \ >>>> / \ >>>> Synchronous Synchronous >>>> Replication Replication >>>> / \ >>>> / \ >>>> +---------------+ +---------------+ >>>> | PostgreSQL 1 | | PostgreSQL 2 | >>>> | (Primary) |◄------►| (Standby) | >>>> +---------------+ +---------------+ >>>> | | >>>> | POSIX filesystem ops | >>>> | | >>>> +---------------+ +---------------+ >>>> | ZFS Pool 1 | | ZFS Pool 2 | >>>> | (3-way mirror)| | (3-way mirror)| >>>> +---------------+ +---------------+ >>>> / | \ / | \ >>>> / | \ / | \ >>>> NBD:10809 NBD:10810 NBD:10811 NBD:10812 NBD:10813 NBD:10814 >>>> | | | | | | >>>> +--------++--------++--------++--------++--------++--------+ >>>> |ZeroFS 1||ZeroFS 2||ZeroFS 3||ZeroFS 4||ZeroFS 5||ZeroFS 6| >>>> +--------++--------++--------++--------++--------++--------+ >>>> | | | | | | >>>> | | | | | | >>>> S3-Region1 S3-Region2 S3-Region3 S3-Region4 S3-Region5 S3-Region6 >>>> (us-east) (eu-west) (ap-south) (us-west) (eu-north) (ap-east) >>>> >>>> Architecture 2: >>>> >>>> PostgreSQL Primary (Region 1) ←→ PostgreSQL Standby (Region 2) >>>> \ / >>>> \ / >>>> Same ZFS Pool (NBD) >>>> | >>>> 6 Global ZeroFS >>>> | >>>> S3 Regions >>>> >>>> >>>> The main advantages I see are: >>>> 1. Dramatic cost reduction for large datasets >>>> 2. Simplified geo-distribution >>>> 3. Infinite storage capacity >>>> 4. Built-in encryption and compression >>>> >>>> Looking forward to your feedback and questions! >>>> >>>> Best, >>>> Pierre >>>> >>>> P.S. The full project includes a custom NFS filesystem too. >>>> >>