On 3/8/26 07:16, Alexandre Felipe wrote:
> Hi Tomas,
>
>> I've decided to run a couple tests, trying to reproduce some of the
>> behaviors described in your (Felipe's) messages.
>> ...
>> I'm attaching the full scripts, raw results, and PDFs with a nicer
>> version of the results.
>
>> The results are pretty positive. For random data (which is about the
>> worst case for I/O), it's consistently faster than master. Yes, the
>> gains with 8 workers is not as significant as with 1 worker. For
>> example, it may look like this:
>>
>> master prefetch
>> 1 worker: 2960 1898 64%
>> 8 workers: 5585 5361 96%
>
> branch: patched
> data: random
> io: buffered
>
> patched master
> iomethod io_uring worker io_uring worker
> workers
> 1 1.52 1.29 2.79 2.75
> 2 1.77 1.63 3.03 3.04
> 4 2.36 4.24 3.44 3.40
> 8 3.60 8.53 4.30 4.30
>
> They are about the same for 1 worker, but degrade as the number of
> workers increase, to be honest I was expecting this behaviour IO not
> with buffered. But as you pointed out using io_unring the issue goes
> away.
>
> Lock contention in pgaio_worker_submit_internal?
> Or maybe nsync > 0 at the bottom of the function?
> (in src/backend/storage/aio/method_worker.c)
>
Yes, this seems to be contention on the lock protecting submission
queue, causing regressions when data gets into page cache. We already
have a fix for that in our working branch, should be part of the next
patch version. I'm not sure why (nsync > 0) would be an issue.
Which is consistent with the observation that io_uring doesn't have the
same regression, simply because it doesn't have the queue.
regards
--
Tomas Vondra