On Sat, May 18, 2024 at 8:09 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Sat, May 18, 2024 at 1:00 AM Alexander Lakhin <exclus...@gmail.com> wrote: > > I decided to compare v17 vs v16 performance (as I did the last year [1]) > > and discovered that v17 loses to v16 in the pg_tpcds (s64da_tpcds) > > benchmark, query15 (and several others, but I focused on this one): > > Best pg-src-master--.* worse than pg-src-16--.* by 52.2 percents (229.84 > > > 151.03): pg_tpcds.query15 > > Average pg-src-master--.* worse than pg-src-16--.* by 53.4 percents (234.20 > > > 152.64): pg_tpcds.query15 > > Please look at the full html report attached in case you're interested. > > > > (I used my pg-mark tool to measure/analyze performance, but I believe the > > same results can be seen without it.) > > Will investigate, but if it's easy for you to rerun, does it help if > you increase Linux readahead, eg blockdev --setra setting?
Andres happened to have TPC-DS handy, and reproduced that regression in q15. We tried some stuff and figured out that it requires parallel_leader_participation=on, ie that this looks like some kind of parallel fairness and/or timing problem. It seems to be a question of which worker finishes up processing matching rows, and the leader gets a ~10ms head start but may be a little more greedy with the new streaming code. He tried reordering the table contents and then saw 17 beat 16. So for q15, initial indications are that this isn't a fundamental regression, it's just a test that is sensitive to some arbitrary conditions. I'll try to figure out some more details about that, ie is it being too greedy on small-ish tables, and generally I do wonder about the interactions between the heuristics and batching working at different levels (OS, seq scan, read stream, hence my earlier ra question which is likely a red herring) and how there might be unintended consequences/interference patterns, but this particular case seems more data dependent.