On 8/13/25 16:44, Andres Freund wrote:
> Hi,
> 
> On 2025-08-13 14:15:37 +0200, Tomas Vondra wrote:
>> In fact, I believe this is about io_method. I initially didn't see the
>> difference you described, and then I realized I set io_method=sync to
>> make it easier to track the block access. And if I change io_method to
>> worker, I get different stats, that also change between runs.
>>
>> With "sync" I always get this (after a restart):
>>
>>    Buffers: shared hit=7435 read=52801
>>
>> while with "worker" I get this:
>>
>>    Buffers: shared hit=4879 read=52801
>>    Buffers: shared hit=5151 read=52801
>>    Buffers: shared hit=4978 read=52801
>>
>> So not only it changes run to tun, it also does not add up to 60236.
> 
> This is reproducible on master? If so, how?
> 
> 
>> I vaguely recall I ran into this some time ago during AIO benchmarking,
>> and IIRC it's due to how StartReadBuffersImpl() may behave differently
>> depending on I/O started earlier. It only calls PinBufferForBlock() in
>> some cases, and PinBufferForBlock() is what updates the hits.
> 
> Hm, I don't immediately see an issue there. The only case we don't call
> PinBufferForBlock() is if we already have pinned the relevant buffer in a
> prior call to StartReadBuffersImpl().
> 
> 
> If this happens only with the prefetching patch applied, is is possible that
> what happens here is that we occasionally re-request buffers that already in
> the process of being read in? That would only happen with a read stream and
> io_method != sync (since with sync we won't read ahead). If we have to start
> reading in a buffer that's already undergoing IO we wait for the IO to
> complete and count that access as a hit:
> 
>       /*
>        * Check if we can start IO on the first to-be-read buffer.
>        *
>        * If an I/O is already in progress in another backend, we want to wait
>        * for the outcome: either done, or something went wrong and we will
>        * retry.
>        */
>       if (!ReadBuffersCanStartIO(buffers[nblocks_done], false))
>       {
> ...
>               /*
>                * Report and track this as a 'hit' for this backend, even 
> though it
>                * must have started out as a miss in PinBufferForBlock(). The 
> other
>                * backend will track this as a 'read'.
>                */
> ...
>               if (persistence == RELPERSISTENCE_TEMP)
>                       pgBufferUsage.local_blks_hit += 1;
>               else
>                       pgBufferUsage.shared_blks_hit += 1;
> ...
> 
> 

I think it has to be this. It only happens with io_method != sync, and
only with effective_io_concurrency > 1. At first I was wondering why I
can't reproduce this for seqscan/bitmapscan, but then I realized those
plans never visit the same block repeatedly - indexscans do that. It's
also not surprising it's timing-sensitive, as it likely depends on how
fast the worker happens to start/complete requests.

What would be a good way to "prove" it really is this?


regards

-- 
Tomas Vondra



Reply via email to