On Thu, 19 Oct 2023 at 22:29, David Rowley <dgrowle...@gmail.com> wrote: > It's hard to imagine why there would be a slowdown as this query uses > a TTSOpsMinimalTuple slot type in the patch and the unpatched version.
I shrunk down your table sizes to 10k rows instead of 1 million rows to reduce the CPU cache pressure on the queries. I ran pgbench for 1 minute on each query and did pg_prewarm on each table. Here are the times I got in milliseconds: Query master Master + 0001 compare Q1 2.576 1.979 130.17% Q2 9.546 9.941 96.03% Q3 9.069 9.536 95.10% Q4 7.285 7.208 101.07% Q5 7.585 6.904 109.86% Q6 162.253 161.434 100.51% Q7 62.507 58.922 106.08% I also noted down the slot type that nodeUnique.c is using in each of the queries: Q1 TTSOpsVirtual Q2 TTSOpsVirtual Q3 TTSOpsVirtual Q4 TTSOpsMinimalTuple Q5 TTSOpsVirtual Q6 TTSOpsMinimalTuple Q7 TTSOpsMinimalTuple So, I'm not really expecting Q4, Q6 or Q7 to change much. However, Q7 does seem to be above noise level faster and I'm not sure why. We can see that Q2 and Q3 become a bit slower. This makes sense as tts_virtual_materialize() is quite a bit more complex than heap_copy_minimal_tuple() which is a simple palloc/memcpy. We'd likely see Q2 and Q3 do better with the patched version if there were more duplicates as there'd be less tuple deforming going on because of the virtual slots. Overall, the patched version is 5.55% faster than master. However, it's pretty hard to say if we should do this or not. Q3 has a mix of varlena and byval types and that came out slower with the patched version. I've attached the script I used to get the results and the setup, which is just your tables shrunk down to 10k rows. David
#!/bin/bash psql -c "select pg_prewarm('t_int'),pg_prewarm('t_text'),pg_prewarm('t_mixed');" postgres for sql in "select distinct a,b from t_int;" "select distinct a,b from t_text;" "select distinct a,b from t_mixed;" "select distinct a,b from (select sum(a) over (order by a rows 2 preceding) a, b from t_int) q;" "select distinct a,b from (select sum(a) over (order by a rows 2 preceding) a, b from t_int order by a, b) q;" "select distinct a,b from (select string_agg(a, ', ') over (order by a rows 2 preceding) a, b from t_text) q;" "select distinct a,b from (select string_agg(left(a, 100), ', ') over (order by a rows 2 preceding) a, b from t_text) q;" do echo "set enable_hashagg=0;" > bench.sql echo "set work_mem = '10GB';" >> bench.sql echo "$sql" >> bench.sql pgbench -n -f bench.sql -M prepared -T 60 postgres | grep latency done
setup.sql
Description: Binary data