How to reduce latency with fast short queries in Postgresql 15.3 on a NUMA server

Sergio Rus Wed, 31 May 2023 05:40:40 -0700

Hi guys,

I've been configuring a new server and tuning Postgresql 15.3, but I'm
struggling with a latency I'm consistently seeing with this new server when
running fast short queries, compared to the other server.


We're running two different versions of Postgresql:

- Server A: Postgresql 9.3
- Server B: Postgresql 15.3

Server B is the new server and is way more powerful than server A:

- Server A: 1x Intel Xeon E3-1270 3.5GHz, 2x 8GB DDR3, RAID0
- Server B: 2x Intel Xeon Platinum 8260 2.4GHz, 4x 16GB DDR4, RAID1

We're running Linux Ubuntu 20.04 on server B and I've been tweaking some
settings in Linux and Postgresql 15.3. With the current setup, Postgresql
15.3 is able to process more than 1 million transactions per second running
pgbench:

    # pgbench --username postgres --select-only --client 100 --jobs 10
--time 20 test
    pgbench (15.3 (Ubuntu 15.3-1.pgdg20.04+1))
    starting vacuum...end.
    transaction type: <builtin: select only>
    scaling factor: 1
    query mode: simple
    number of clients: 100
    number of threads: 10
    maximum number of tries: 1
    duration: 20 s
    number of transactions actually processed: 23039950
    number of failed transactions: 0 (0.000%)
    latency average = 0.087 ms
    initial connection time = 62.536 ms
    tps = 1155053.135317 (without initial connection time)

As shown in pgbench, the performance is great. Also when testing individual
queries, heavy queries (those taking a few ms) run faster on server B than
server A. Unfortunately when we run fast short SELECT queries (< 1 ms),
server A is consistently running faster than server B, even if the query
plans are the same:

    Server A:

    # EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT 1 AS "a" FROM "foobar"
WHERE ("foobar"."id" = 1) LIMIT 1;

 QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.42..8.44 rows=1 width=0) (actual time=0.008..0.008
rows=1 loops=1)
       Output: (1)
       Buffers: shared hit=5
       ->  Index Only Scan using foobar_pkey on public.foobar
 (cost=0.42..8.44 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=1)
             Output: 1
             Index Cond: (foobar.id = 1)
             Heap Fetches: 1
             Buffers: shared hit=5
     Total runtime: 0.017 ms
    (9 rows)

    Time: 0.281 ms


    Server B:

    # EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT 1 AS "a" FROM "foobar"
WHERE ("foobar"."id" = 1) LIMIT 1;

 QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.00..1.11 rows=1 width=4) (actual time=0.019..0.021
rows=1 loops=1)
       Output: 1
       Buffers: shared hit=4
       ->  Index Only Scan using foobar_pkey on public.foobar
 (cost=0.00..1.11 rows=1 width=4) (actual time=0.018..0.018 rows=1 loops=1)
             Output: 1
             Index Cond: (foobar.id = 1)
             Heap Fetches: 0
             Buffers: shared hit=4
     Planning Time: 0.110 ms
     Execution Time: 0.045 ms
    (10 rows)

    Time: 0.635 ms


RAID1 could add some latency on server B if it was reading from disk, but
I've confirmed that these queries are hitting the buffer/cache and
therefore reading data from memory and not from disk. I've checked the hit
rate with the following query:

    SELECT 'cache hit rate' AS name, sum(heap_blks_hit) /
(sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio FROM
pg_statio_user_tables;

The hit rate was over 95% and it increased as soon as I ran those queries.
Same thing with the index hit rate.

I've been playing with some parameters in Postgresql, decreasing/increasing
the number of workers, shared buffers, work_mem, JIT, cpu_*_cost variables,
etc, but nothing did help to reduce that latency.

Here are the settings I'm currently using with Postgresql 15.3 after a lot
of work experimenting with different values:

    checkpoint_completion_target = 0.9
    checkpoint_timeout = 900
    cpu_index_tuple_cost = 0.00001
    cpu_operator_cost = 0.00001
    effective_cache_size = 12GB
    effective_io_concurrency = 200
    jit = off
    listen_addresses = 'localhost'
    maintenance_work_mem = 1GB
    max_connections = 100
    max_parallel_maintenance_workers = 4
    max_parallel_workers = 12
    max_parallel_workers_per_gather = 4
    max_wal_size = 4GB
    max_worker_processes = 12
    min_wal_size = 1GB
    random_page_cost = 1.1
    shared_buffers = 4GB
    ssl = off
    timezone = 'UTC'
    wal_buffers = 16MB
    work_mem = 64MB

Some notes about those settings:

    - We're running other services on this server, that's why I'm not using
more resources.
    - Tweaking the cpu_*_cost parameters was crucial to improve the query
plan. With the default values Postgresql was consistently using a slower
query plan.

I've been looking at some settings in Linux as well:

    - Swappiness is set to the lowest safe value: vm.swappiness = 1
    - Huge Pages is not being used and Transparent Huge Pages (THP) is set
to 'madvise'. Postgresql 15.3 is using the default value for the
'huge_pages' parameter: 'try'.
    - The memory overcommit policy is set to 1: vm.overcommit_memory = 1

I've been playing with Huge Pages, to try to force Postgresql using this
feature. I manually allocated the number of Huge Pages as shown in this
query:

    SHOW shared_memory_size_in_huge_pages;

I confirmed Huge Pages were being used by Postgresql, but unfortunately I
didn't see any improvement regarding latency and performance. So I set this
back to the previous state.

Conclusion:

The latency is quite low on both servers, but when you're running dozens or
hundreds of fast short queries concurrently, on aggregate you see the
difference, with server A being 0.1-1.0 seconds faster than server B.

As you can see, server B has 2 CPUs and is using NUMA on Linux.  And the
CPU clock is slower on server B than server A. Maybe any of those are
causing that latency?

Any suggestions or ideas where to look? I'd really appreciate your help.

Thank you

How to reduce latency with fast short queries in Postgresql 15.3 on a NUMA server

Reply via email to