Hi, Konstantin

>Hi,
>I really think that we need to move to global caches (and especially catalog 
>caches) in
>Postgres.
>Modern NUMA servers may have hundreds of cores and to be able to utilize all 
>of them,
>we may need to start large number (hundreds) of backends.
>Memory overhead of local cache multiplied by 1000 can be quite significant.

Yeah, thank you for the comment.


>I am quite skeptical concerning performance results you have provided.
>Once dataset completely fits in memory (which is true in your case), 
>select-only
>pgbench with prepared statements should be about two times faster, than without
>prepared statements. And in your case performance with prepared statements is 
>even
>worser.
>
>I wonder if you have repeated each measurement multiple time, to make sure 
>that it
>is not just a fluctuation.
>Also which postgresql configuration you have used. If it is default 
>postgresql.conf with
>128Mb shared buffers size, then you are measuring time of disk access and 
>catalog
>cache is not relevant for performance in this case.
>
>Below are result I got with pgbench scale 100 (with scale 10 results are 
>slightly better)
>at my desktop with just 16Gb of RAM and 4 ccore.:
>
>                                    |master branch | prototype      | 
> proto/master
>(%)
>    
> ------------------------------------------------------------------------------------
>    pgbench -c10 -T60 -Msimple -S   | 187189      |182123         |97%
>    pgbench -c10 -T60 -Msimple      | 15495       |15112          |97%
>    pgbench -c10 -T60 -Mprepared -S | 98273       |92810          |94%
>    pgbench -c10 -T60 -Mprepared    | 25796       |25169          |97%
>
>As you see there are no surprises here: negative effect of shared cache is the 
>largest
>for the case of non-prepared selects (because selects themselves are much 
>faster
>than updates and during compilation we have to access relations multiple 
>times).
>

As you pointed out my shared_memory and scaling factor was too small.
I did the benchmark again with a new setting and my result seems to reproduce 
your result.

On the machine with 128GB memory and 16 cores, shared_buffer was set to 32GB and
db was initialized with -s100.

TPS result follows: (mean of 10 times measurement; round off the decimal) 
                                          |master branch | proto         | 
proto/master (%)
   
------------------------------------------------------------------------------------
  pgbench -c48 -T60 -j16 -Msimple -S    |122140         | 114103 | 93
  pgbench -c48 -T60 -j16 -Msimple       | 7858          | 7822   | 100
  pgbench -c48 -T60 -j16 -Mprepared -S  |221740         | 210778 | 95
  pgbench -c48 -T60 -j16 -Mprepared     | 9257          | 8998   | 97
  
As you mentioned, SELECT only query has more overheads.

( By the way, I think in the later email you mentioned about the result when 
the concurrent number of clients is larger.
 On this point I'll also try to check the result.)

====================
Takeshi Ideriha
Fujitsu Limited


Reply via email to