On 7/16/25 17:29, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 4:40 AM Tomas Vondra <to...@vondra.me> wrote:
>> For "uniform" data set, both prefetch patches do much better than master
>> (for low selectivities it's clearer in the log-scale chart). The
>> "complex" prefetch patch appears to have a bit of an edge for >1%
>> selectivities. I find this a bit surprising, the leaf pages have ~360
>> index items, so I wouldn't expect such impact due to not being able to
>> prefetch beyond the end of the current leaf page. But could be on
>> storage with higher latencies (this is the cloud SSD on azure).
>
> How can you say that the "complex" patch has "a bit of an edge for >1%
> selectivities"?
>
> It looks like a *massive* advantage on all "linear" test results.
> Those are only about 1/3 of all tests -- but if I'm not mistaken
> they're the *only* tests where prefetching could be expected to help a
> lot. The "cyclic" tests are adversarial/designed to make the patch
> look bad. The "uniform" tests have uniformly random heap accesses (I
> think), which can only be helped so much by prefetching.
>
> For example, with "linear_10 / eic=16 / sync", it looks like "complex"
> has about half the latency of "simple" in tests where selectivity is
> 10. The advantage for "complex" is even greater at higher
> "selectivity" values. All of the other "linear" test results look
> about the same.
>
> Have I missed something?
>
That paragraph starts with "for uniform data set", and the statement
about 1% selectivities was only about that particular data set.
You're right there's a massive difference on all the "correlated" data
sets. I believe (assume) that's caused by the same issue, discussed in
this thread (where the simple patch seems to do fewer fadvise calls). I
only picked the "cyclic" data set as an example, representing this.
FWIW I suspect the difference on "uniform" data set might be caused by
this too, because at ~5% selectivity the queries start to hit pages
multiple times (there are ~20 rows/page, hence ~5% means ~1 row). But
it's much weaker than on the correlated data sets, of course.
regards
--
Tomas Vondra