po 19. 8. 2019 v 13:16 odesílatel Konstantin Knizhnik < [email protected]> napsal:
> > > On 19.08.2019 11:51, Konstantin Knizhnik wrote: > > > > On 18.08.2019 11:28, Pavel Stehule wrote: > > > > ne 18. 8. 2019 v 9:02 odesílatel Konstantin Knizhnik < > [email protected]> napsal: > >> >> >> On 16.08.2019 20:17, Pavel Stehule wrote: >> >> >> >> pá 16. 8. 2019 v 16:12 odesílatel Konstantin Knizhnik < >> [email protected]> napsal: >> >>> I did more investigations of performance of global temp tables with >>> shared buffers vs. vanilla (local) temp tables. >>> >>> 1. Combination of persistent and temporary tables in the same query. >>> >>> Preparation: >>> create table big(pk bigint primary key, val bigint); >>> insert into big values >>> (generate_series(1,100000000),generate_series(1,100000000)); >>> create temp table lt(key bigint, count bigint); >>> create global temp table gt(key bigint, count bigint); >>> >>> Size of table is about 6Gb, I run this test on desktop with 16GB of RAM >>> and postgres with 1Gb shared buffers. >>> I run two queries: >>> >>> insert into T (select count(*),pk/P as key from big group by key); >>> select sum(count) from T; >>> >>> where P is (100,10,1) and T is name of temp table (lt or gt). >>> The table below contains times of both queries in msec: >>> >>> Percent of selected data >>> 1% >>> 10% >>> 100% >>> Local temp table >>> 44610 >>> 90 >>> 47920 >>> 891 >>> 63414 >>> 21612 >>> Global temp table >>> 44669 >>> 35 >>> 47939 >>> 298 >>> 59159 >>> 26015 >>> >>> As you can see, time of insertion in temporary table is almost the same >>> and time of traversal of temporary table is about twice smaller for >>> global temp table >>> when it fits in RAM together with persistent table and slightly worser >>> when it doesn't fit. >>> >>> >>> >>> 2. Temporary table only access. >>> The same system, but Postgres is configured with shared_buffers=10GB, >>> max_parallel_workers = 4, max_parallel_workers_per_gather = 4 >>> >>> Local temp tables: >>> create temp table local_temp(x1 bigint, x2 bigint, x3 bigint, x4 bigint, >>> x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint); >>> insert into local_temp values >>> (generate_series(1,100000000),0,0,0,0,0,0,0,0); >>> select sum(x1) from local_temp; >>> >>> Global temp tables: >>> create global temporary table global_temp(x1 bigint, x2 bigint, x3 >>> bigint, x4 bigint, x5 bigint, x6 bigint, x7 bigint, x8 bigint, x9 bigint); >>> insert into global_temp values >>> (generate_series(1,100000000),0,0,0,0,0,0,0,0); >>> select sum(x1) from global_temp; >>> >>> Results (msec): >>> >>> Insert >>> Select >>> Local temp table 37489 >>> 48322 >>> Global temp table 44358 >>> 3003 >>> >>> So insertion in local temp table is performed slightly faster but select >>> is 16 times slower! >>> >>> Conclusion: >>> In the assumption then temp table fits in memory, global temp tables >>> with shared buffers provides better performance than local temp table. >>> I didn't consider here global temp tables with local buffers because for >>> them results should be similar with local temp tables. >>> >> >> Probably there is not a reason why shared buffers should be slower than >> local buffers when system is under low load. >> >> access to shared memory is protected by spin locks (are cheap for few >> processes), so tests in one or few process are not too important (or it is >> just one side of space) >> >> another topic can be performance on MS Sys - there are stories about not >> perfect performance of shared memory there. >> >> Regards >> >> Pavel >> >> One more test which is used to simulate access to temp tables under high >> load. >> I am using "upsert" into temp table in multiple connections. >> >> create global temp table gtemp (x integer primary key, y bigint); >> >> upsert.sql: >> insert into gtemp values (random() * 1000000, 0) on conflict(x) do update >> set y=gtemp.y+1; >> >> pgbench -c 10 -M prepared -T 100 -P 1 -n -f upsert.sql postgres >> >> >> I failed to find some standard way in pgbech to perform per-session >> initialization to create local temp table, >> so I just insert this code in pgbench code: >> >> diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c >> index 570cf33..af6a431 100644 >> --- a/src/bin/pgbench/pgbench.c >> +++ b/src/bin/pgbench/pgbench.c >> @@ -5994,6 +5994,7 @@ threadRun(void *arg) >> { >> if ((state[i].con = doConnect()) == NULL) >> goto done; >> + executeStatement(state[i].con, "create temp table >> ltemp(x integer primary key, y bigint)"); >> } >> } >> >> >> Results are the following: >> Global temp table: 117526 TPS >> Local temp table: 107802 TPS >> >> >> So even for this workload global temp table with shared buffers are a >> little bit faster. >> I will be pleased if you can propose some other testing scenario. >> > > please, try to increase number of connections. > > > With 20 connections and 4 pgbench threads results are similar: 119k TPS > for global temp tables and 115k TPS for local temp tables. > > I have tried yet another scenario: read-only access to temp tables: > > \set id random(1,10000000) > select sum(y) from ltemp where x=:id; > > Tables are created and initialized in pgbench session startup: > > knizhnik@knizhnik:~/postgresql$ git diff > diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c > index 570cf33..95295b0 100644 > --- a/src/bin/pgbench/pgbench.c > +++ b/src/bin/pgbench/pgbench.c > @@ -5994,6 +5994,8 @@ threadRun(void *arg) > { > if ((state[i].con = doConnect()) == NULL) > goto done; > + executeStatement(state[i].con, "create temp table > ltemp(x integer primary key, y bigint)"); > + executeStatement(state[i].con, "insert into ltemp > values (generate_series(1,1000000), generate_series(1,1000000))"); > } > } > > > Results for 10 connections with 10 million inserted records per table and > 100 connections with 1 million inserted record per table : > > #connections: > 10 > 100 > local temp > 68k > 90k > global temp, shared_buffers=1G > 63k > 61k > global temp, shared_buffers=10G 150k > 150k > > > So temporary tables with local buffers are slightly faster when data > doesn't fit in shared buffers, but significantly slower when it fits. > > > > All previously reported results were produced at my desktop. > I also run this read-only test on huge IBM server (POWER9, 2 NUMA nodes, > 176 CPU, 1Tb RAM). > > Here the difference between local and global tables is not so large: > > Local temp: 739k TPS > Global temp: 924k TPS > is not difference between local temp buffers and global temp buffers by too low value of TEMP_BUFFERS? Pavel > > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > >
