On 8/9/25 01:47, Andres Freund wrote:
> Hi,
>
> On 2025-08-06 16:12:53 +0200, Tomas Vondra wrote:
>> That's quite possible. What concerns me about using tables like pgbench
>> accounts table is reproducibility - initially it's correlated, and then
>> it gets "randomized" by the workload. But maybe the exact pattern
>> depends on the workload - how many clients, how long, how it correlates
>> with vacuum, etc. Reproducing the dataset might be quite tricky.
>>
>> That's why I prefer using "reproducible" data sets. I think the data
>> sets with "fuzz" seem like a pretty good model. I plan to experiment
>> with adding some duplicate values / runs, possibly with two "levels" of
>> randomness (global for all runs, and smaller local perturbations).
>> [...]
>> Yeah, cases like that are interesting. I plan to do some randomized
>> testing, exploring "strange" combinations of parameters, looking for
>> weird behaviors like that.
>
> I'm just catching up: Isn't it a bit early to focus this much on testing? ISMT
> that the patchsets for both approaches currently have some known architectural
> issues and that addressing them seems likely to change their performance
> characteristics.
>
Perhaps. For me benchmarks are a way to learn about stuff and better
understand the pros/cons of approaches. It's possible some of the
changes will impact the characteristics, but I doubt it can change the
fundamental differences due to the simple approach being limited to a
single leaf page, etc.
regards
--
Tomas Vondra