Hi, Konstantin >Hi, >I really think that we need to move to global caches (and especially catalog >caches) in >Postgres. >Modern NUMA servers may have hundreds of cores and to be able to utilize all >of them, >we may need to start large number (hundreds) of backends. >Memory overhead of local cache multiplied by 1000 can be quite significant.
Yeah, thank you for the comment. >I am quite skeptical concerning performance results you have provided. >Once dataset completely fits in memory (which is true in your case), >select-only >pgbench with prepared statements should be about two times faster, than without >prepared statements. And in your case performance with prepared statements is >even >worser. > >I wonder if you have repeated each measurement multiple time, to make sure >that it >is not just a fluctuation. >Also which postgresql configuration you have used. If it is default >postgresql.conf with >128Mb shared buffers size, then you are measuring time of disk access and >catalog >cache is not relevant for performance in this case. > >Below are result I got with pgbench scale 100 (with scale 10 results are >slightly better) >at my desktop with just 16Gb of RAM and 4 ccore.: > > |master branch | prototype | > proto/master >(%) > > ------------------------------------------------------------------------------------ > pgbench -c10 -T60 -Msimple -S | 187189 |182123 |97% > pgbench -c10 -T60 -Msimple | 15495 |15112 |97% > pgbench -c10 -T60 -Mprepared -S | 98273 |92810 |94% > pgbench -c10 -T60 -Mprepared | 25796 |25169 |97% > >As you see there are no surprises here: negative effect of shared cache is the >largest >for the case of non-prepared selects (because selects themselves are much >faster >than updates and during compilation we have to access relations multiple >times). > As you pointed out my shared_memory and scaling factor was too small. I did the benchmark again with a new setting and my result seems to reproduce your result. On the machine with 128GB memory and 16 cores, shared_buffer was set to 32GB and db was initialized with -s100. TPS result follows: (mean of 10 times measurement; round off the decimal) |master branch | proto | proto/master (%) ------------------------------------------------------------------------------------ pgbench -c48 -T60 -j16 -Msimple -S |122140 | 114103 | 93 pgbench -c48 -T60 -j16 -Msimple | 7858 | 7822 | 100 pgbench -c48 -T60 -j16 -Mprepared -S |221740 | 210778 | 95 pgbench -c48 -T60 -j16 -Mprepared | 9257 | 8998 | 97 As you mentioned, SELECT only query has more overheads. ( By the way, I think in the later email you mentioned about the result when the concurrent number of clients is larger. On this point I'll also try to check the result.) ==================== Takeshi Ideriha Fujitsu Limited