Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward

Nathan Bossart Tue, 10 Jun 2025 14:54:07 -0700

On Mon, Jun 09, 2025 at 10:09:57PM +0200, Dimitrios Apostolou wrote:
> Fix by avoiding forward seeks for jumps of less than 1MB forward.
> Do instead sequential reads.
> 
> Performance gain can be significant, depending on the size of the dump
> and the I/O subsystem. On my local NVMe drive, read speeds for that
> phase of pg_restore increased from 150MB/s to 3GB/s.


I was curious about what exactly was leading to the performance gains you
are seeing.  This page has an explanation:

        https://www.mjr19.org.uk/IT/fseek.html

I also wrote a couple of test programs to show the difference between
fseeko-ing and fread-ing through a file with various sizes.  On a Linux
machine, I see this:

     log2(n) | fseeko  | fread
    ---------+---------+-------
           1 | 109.288 | 5.528
           2 |  54.881 | 2.848
           3 |   27.65 | 1.504
           4 |  13.953 | 0.834
           5 |     7.1 |  0.49
           6 |   3.665 | 0.322
           7 |   1.944 | 0.244
           8 |   1.085 | 0.201
           9 |   0.658 | 0.185
          10 |   0.443 | 0.175
          11 |   0.253 | 0.171
          12 |   0.102 | 0.162
          13 |   0.075 |  0.13
          14 |   0.061 | 0.114
          15 |   0.054 |   0.1

So, fseeko() starts winning around 4096 bytes.  On macOS, the differences
aren't quite as dramatic, but 4096 bytes is the break-even point there,
too.  I imagine there's a buffer around that size somewhere...

This doesn't fully explain the results you are seeing, but it does seem to
validate the idea.  I'm curious if you see further improvement with even
lower thresholds (e.g., 8KB, 16KB, 32KB). 

-- 
nathan

Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward

Reply via email to