Re: Hybrid Hash/Nested Loop joins and caching results from subplans

David Rowley Wed, 11 Nov 2020 18:37:10 -0800

On Tue, 10 Nov 2020 at 12:55, David Rowley <[email protected]> wrote:
>
> On Tue, 10 Nov 2020 at 12:49, Tom Lane <[email protected]> wrote:
> >
> > Alvaro Herrera <[email protected]> writes:
> > > Are you taking into account the possibility that generated machine code
> > > is a small percent slower out of mere bad luck?  I remember someone
> > > suggesting that they can make code 2% faster or so by inserting random
> > > no-op instructions in the binary, or something like that.  So if the
> > > difference between v8 and v9 is that small, then it might be due to this
> > > kind of effect.
> >
> > Yeah.  I believe what this arises from is good or bad luck about relevant
> > tight loops falling within or across cache lines, and that sort of thing.
> > We've definitely seen performance changes up to a couple percent with
> > no apparent change to the relevant code.
>
> It possibly is this issue.
>
> Normally how I build up my confidence in which is faster is why just
> rebasing on master as it advances and see if the winner ever changes.
> The theory here is if one patch is consistently the fastest, then
> there's more chance if there being a genuine reason for it.


I kicked off a script last night that ran benchmarks on master, v8 and
v9 of the patch on 1 commit per day for the past 30 days since
yesterday.  The idea here is that as the code changes that if the
performance differences are due to code alignment then there should be
enough churn in 30 days to show if this is the case.

The quickly put together script is attached. It would need quite a bit
of modification to run on someone else's machine.

This took about 20 hours to run.  I found that v8 is faster on 28 out
of 30 commits. In the two cases where v9 was faster, v9 took 99.8% and
98.5% of the time of v8.   In the 28 cases where v8 was faster it was
generally about 2-4% faster, but a couple of times 8-10% faster. Full
results attached in .csv file. Also the query I ran to compare the
results once loaded into Postgres.

David

#!/bin/bash
seconds=30
scale=100000

pg_ctl stop -D pgdata -w

for sha in e578c17d8 ae0f7b11f b94109ce3 7f4708818 bc49f8780 540849814 
929c69aa1 f49b85d78 bbb927b4d 555eb1a4f f8721bd75 83d727e5b 21d36747d ba9f18abd 
20d3fe900 f893e68d7 ad1c36b07 f90149e62 b401fa206 f90e80b91 b17ff07aa 5b3dca096 
bf797a8d9 113d3591b 5b7bfc397 5ee180a39 b4c9695e7 8b39345a9 8f113698b d2e4bf688
do
        cd ~/pg_src
        git reset --hard
        git clean -f
        git checkout $sha
        ./configure --prefix=/home/drowley/pg > /dev/null
        for branch in master resultcache_v8 resultcache_v9
        do
                cd ~/pg_src
                git reset --hard
                git clean -f
                for file in /home/drowley/$branch/*
                do
                        patch -p1 < $file
                done
                make clean -s
                make -j -s
                make install -s
                cd contrib/pg_prewarm
                make -j -s
                make install -s
                cd
                sleep 1
                # create database and load data when doing the first branch
                if [ $branch = master ]
                then
                        rm -rf pgdata
                        initdb -D pgdata > /dev/null
                        cp postgresql.conf pgdata
                        pg_ctl start -D pgdata -l pg.log
                        
                        psql -c "drop table if exists hundredk, lookup1, 
lookup100;" postgres
                        psql -c "create table hundredk (hundredk int, tenk int, 
thousand int, hundred int, ten int, one int);" postgres
                        psql -c "insert into hundredk select 
x%100000,x%10000,x%1000,x%100,x%10,1 from generate_Series(1,$scale) x;" postgres

                        psql -c "create table lookup100 (a int);" postgres
                        psql -c "insert into lookup100 select x from 
generate_Series(1,$scale)x,generate_Series(1,100);" postgres
                        psql -c "create index on lookup100 (a);" postgres

                        psql -c "create table lookup1 (a int);" postgres
                        psql -c "insert into lookup1 select x from 
generate_Series(1,$scale)x;" postgres
                        psql -c "create index on lookup1 (a);" postgres

                        psql -c "vacuum analyze lookup1, lookup100, hundredk;" 
postgres
                        psql -c "create extension pg_prewarm;" postgres
                        pg_ctl stop -D pgdata -w
                fi

                pg_ctl start -D pgdata -l pg.log -w
                psql -c "select pg_prewarm('lookup1'), pg_prewarm('lookup100'), 
pg_prewarm('hundredk');" postgres
                for tbl in lookup1 lookup100
                do
                        for target in "count(*)" "count(l.a)" "'*'"
                        do
                                for col in thousand hundred ten one
                                do
                                        echo "select $target from hundredk hk 
inner join $tbl l on hk.$col = l.a" > bench.sql
                                        echo Testing $sha $branch $tbl $target 
$col >> bench.log
                                        pgbench -n -T $seconds -f bench.sql 
postgres | grep latency >> bench.log
                                done
                        done
                done
                pg_ctl stop -D pgdata -w
        done
done

resultcache_small_multiple_version.csv
Description: MS-Excel spreadsheet

git log --format="%H,%ai" --date=local > /home/drowley/commit.txt
psql postgres
# create table commits (sha text, author_time timestamptz);
# copy commits from '/home/drowley/pg_src/commit.txt' with(format csv);
# select string_agg(left(sha,9),' ' order by author_time) from (select * from 
(select sha,author_time at time zone 'Pacific/Auckland' 
author_time,row_number() over(partition by date(author_time at time zone 
'Pacific/Auckland') order by author_time) rn from commits)a where rn = 1 order 
by author_time desc limit 30) a;

Use these in script: e578c17d8 ae0f7b11f b94109ce3 7f4708818 bc49f8780 
540849814 929c69aa1 f49b85d78 bbb927b4d 555eb1a4f f8721bd75 83d727e5b 21d36747d 
ba9f18abd 20d3fe900 f893e68d7 ad1c36b07 f90149e62 b401fa206 f90e80b91 b17ff07aa 
5b3dca096 bf797a8d9 113d3591b 5b7bfc397 5ee180a39 b4c9695e7 8b39345a9 8f113698b 
d2e4bf688

CREATE TABLE resultcache_bench_small (
    commit text,
    tbl text,
    target text,
    col text,
    latency_master numeric(10,3),
    latency_v8 numeric(10,3),
    latency_v9 numeric(10,3)
);

postgres=# select commit, sum(latency_master) as master, sum(latency_v8) v8, 
sum(latency_v9) v9, round(avg(latency_v8/latency_master)*100,1) as 
v8_vs_master, round(avg(latency_v9/latency_master)*100,1) as v9_vs_master, 
round(avg(latency_v8/latency_v9)*100,1) as v8_vs_v9, 
round(avg(latency_v9/latency_v8)*100,1) as v9_vs_v8 from 
resultcache_bench_small group by 1;
  commit   |  master   |    v8     |    v9     | v8_vs_master | v9_vs_master | 
v8_vs_v9 | v9_vs_v8
-----------+-----------+-----------+-----------+--------------+--------------+----------+----------
 5b3dca096 | 16820.478 | 11507.545 | 11578.571 |         73.9 |         79.2 |  
   94.3 |    106.4
 f893e68d7 | 16595.615 | 11252.886 | 12065.309 |         73.8 |         82.1 |  
   90.7 |    110.7
 b94109ce3 | 16795.774 | 11330.868 | 11446.921 |         74.0 |         79.2 |  
   94.6 |    106.1
 f90e80b91 | 17004.737 | 11551.900 | 11316.972 |         77.2 |         77.3 |  
  100.3 |     99.8
 7f4708818 | 16779.402 | 11235.536 | 11117.761 |         76.5 |         79.3 |  
   97.3 |    103.0
 ba9f18abd | 17389.321 | 11623.239 | 11457.698 |         74.3 |         77.9 |  
   96.6 |    104.0
 f8721bd75 | 16818.878 | 11460.189 | 11368.843 |         74.5 |         78.2 |  
   96.1 |    104.3
 8b39345a9 | 16419.371 | 11467.491 | 11092.032 |         75.5 |         76.9 |  
   99.3 |    101.1
 5b7bfc397 | 16226.153 | 11273.428 | 11369.520 |         75.5 |         79.2 |  
   96.5 |    104.1
 e578c17d8 | 17463.136 | 11295.097 | 11225.809 |         73.5 |         75.6 |  
   98.1 |    102.1
 f49b85d78 | 16773.226 | 11473.760 | 11044.829 |         76.3 |         79.8 |  
   96.9 |    103.7
 d2e4bf688 | 16502.918 | 11243.120 | 11070.934 |         74.7 |         77.8 |  
   97.1 |    103.4
 ad1c36b07 | 16902.996 | 11500.667 | 11092.086 |         74.0 |         75.8 |  
   98.4 |    101.9
 83d727e5b | 16717.950 | 11725.605 | 11658.286 |         76.4 |         77.8 |  
   98.4 |    101.7
 b4c9695e7 | 16523.515 | 11233.828 | 11176.714 |         74.2 |         78.2 |  
   95.9 |    104.7
 113d3591b | 16792.197 | 11381.005 | 11301.595 |         76.0 |         78.8 |  
   97.5 |    102.9
 bc49f8780 | 16782.435 | 11405.384 | 11093.354 |         76.2 |         77.8 |  
   98.7 |    101.5
 20d3fe900 | 16592.252 | 11703.298 | 11215.672 |         76.5 |         78.3 |  
   98.9 |    101.9
 b17ff07aa | 17246.041 | 11726.405 | 11376.747 |         72.1 |         74.0 |  
   98.0 |    102.2
 f90149e62 | 17118.627 | 11244.809 | 11173.333 |         73.6 |         75.3 |  
   98.4 |    101.8
 5ee180a39 | 16397.251 | 11523.314 | 11049.711 |         76.3 |         78.9 |  
   98.0 |    102.6
 bf797a8d9 | 16793.719 | 11505.530 | 11371.520 |         75.2 |         77.4 |  
   98.0 |    102.3
 b401fa206 | 17226.204 | 11306.590 | 11350.644 |         73.5 |         75.5 |  
   97.9 |    102.3
 21d36747d | 16669.999 | 11713.185 | 11260.066 |         77.4 |         76.5 |  
  101.6 |     98.5
 bbb927b4d | 17096.479 | 11711.497 | 11415.711 |         72.9 |         75.7 |  
   97.2 |    103.3
 929c69aa1 | 16613.614 | 11372.493 | 11033.726 |         76.4 |         80.3 |  
   96.6 |    104.1
 555eb1a4f | 17091.994 | 11694.371 | 11403.378 |         72.5 |         75.5 |  
   97.0 |    103.5
 ae0f7b11f | 17052.744 | 11277.772 | 11755.258 |         74.9 |         81.6 |  
   92.8 |    108.1
 8f113698b | 16129.810 | 11321.365 | 11138.674 |         76.5 |         78.7 |  
   98.0 |    102.3
 540849814 | 17129.610 | 11392.372 | 11024.894 |         75.2 |         79.1 |  
   96.6 |    104.1
(30 rows)

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Reply via email to