Surprising optimizer behavior with a subquery, a GROUP BY, and a range filter

wotq Fri, 02 Jan 2026 12:27:41 -0800

Hi,

I'm unsure whether or not to submit this as a bug report, so I figured
I'd ask here. On PostgreSQL 18.1 with the following setup:


---------------------------------------
-- schema
CREATE TABLE test (
    a int NOT NULL,
    b int NOT NULL,
    PRIMARY KEY (a, b)
);

-- some test data
INSERT INTO test (a, b)
SELECT random(1, 10000000), n
FROM generate_series(1, 10000000) s(n);

-- just for good measure
ANALYZE test;
REINDEX TABLE test;
---------------------------------------

The following query isn't optimized as I would expect:

-------------------------
SELECT sum(b) AS b
FROM (SELECT * FROM test)
WHERE a <= 1000
GROUP BY ();
-------------------------

Here's the output of EXPLAIN (ANALYZE, BUFFERS, SETTINGS):

------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=177581.33..177581.34 rows=1 width=8) (actual 
time=331.551..331.552 rows=1.00 loops=1)
   Group Key: ()
   Buffers: shared hit=2606 read=41642
   ->  Seq Scan on test  (cost=0.00..169248.00 rows=3333333 width=4) (actual 
time=2.998..331.253 rows=1012.00 loops=1)
         Filter: (a <= 1000)
         Rows Removed by Filter: 9998988
         Buffers: shared hit=2606 read=41642
 Planning Time: 0.065 ms
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.238 ms (Deform 0.055 ms), Inlining 0.000 ms, 
Optimization 0.134 ms, Emission 1.929 ms, Total 2.301 ms
 Execution Time: 331.852 ms
------------------------------------------------------------------------------------------------------------------------------

But! Several very small tweaks to the query all result in much better
plans. For example, removing the outer GROUP BY like so:

-------------------------
SELECT sum(b) AS b
FROM (SELECT * FROM test)
WHERE a <= 1000;
-------------------------

Results in the following plan:

-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=33.23..33.24 rows=1 width=8) (actual time=0.157..0.158 
rows=1.00 loops=1)
   Buffers: shared hit=361
   ->  Index Only Scan using test_pkey on test  (cost=0.43..30.64 rows=1040 
width=4) (actual time=0.011..0.112 rows=1012.00 loops=1)
         Index Cond: (a <= 1000)
         Heap Fetches: 0
         Index Searches: 1
         Buffers: shared hit=361
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.092 ms
 Execution Time: 0.171 ms
-------------------------------------------------------------------------------------------------------------------------------------

Removing the subquery:

------------------
SELECT sum(b) AS b
FROM test
WHERE a <= 1000
GROUP BY ();
------------------

Or moving the WHERE into the subquery:

-----------------------------------------
SELECT sum(b) AS b
FROM (SELECT * FROM test WHERE a <= 1000)
GROUP BY ();
-----------------------------------------

Both give this plan:

-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=33.23..33.24 rows=1 width=8) (actual time=0.155..0.156 
rows=1.00 loops=1)
   Group Key: ()
   Buffers: shared hit=361
   ->  Index Only Scan using test_pkey on test  (cost=0.43..30.64 rows=1040 
width=4) (actual time=0.011..0.111 rows=1012.00 loops=1)
         Index Cond: (a <= 1000)
         Heap Fetches: 0
         Index Searches: 1
         Buffers: shared hit=361
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.083 ms
 Execution Time: 0.174 ms
-------------------------------------------------------------------------------------------------------------------------------------

Adding GROUP BY or DISTINCT to the subquery gives two slightly
different, but still faster, plans. For GROUP BY:

---------------------------------------
SELECT sum(b) AS b
FROM (SELECT * FROM test GROUP BY a, b)
WHERE a <= 1000
GROUP BY ();
---------------------------------------

The plan is:

-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48.84..48.84 rows=1 width=8) (actual time=0.255..0.255 
rows=1.00 loops=1)
   Group Key: ()
   Buffers: shared hit=361
   ->  Group  (cost=0.43..35.84 rows=1040 width=8) (actual time=0.013..0.205 
rows=1012.00 loops=1)
         Group Key: test.a, test.b
         Buffers: shared hit=361
         ->  Index Only Scan using test_pkey on test  (cost=0.43..30.64 
rows=1040 width=8) (actual time=0.011..0.117 rows=1012.00 loops=1)
               Index Cond: (a <= 1000)
               Heap Fetches: 0
               Index Searches: 1
               Buffers: shared hit=361
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.108 ms
 Execution Time: 0.271 ms
-------------------------------------------------------------------------------------------------------------------------------------------

And for DISTINCT:

----------------------------------
SELECT sum(b) AS b
FROM (SELECT DISTINCT * FROM test)
WHERE a <= 1000
GROUP BY ();
----------------------------------

The plan is:

-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48.84..48.84 rows=1 width=8) (actual time=0.280..0.281 
rows=1.00 loops=1)
   Group Key: ()
   Buffers: shared hit=361
   ->  Unique  (cost=0.43..35.84 rows=1040 width=8) (actual time=0.012..0.227 
rows=1012.00 loops=1)
         Buffers: shared hit=361
         ->  Index Only Scan using test_pkey on test  (cost=0.43..30.64 
rows=1040 width=8) (actual time=0.011..0.119 rows=1012.00 loops=1)
               Index Cond: (a <= 1000)
               Heap Fetches: 0
               Index Searches: 1
               Buffers: shared hit=361
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.105 ms
 Execution Time: 0.298 ms
-------------------------------------------------------------------------------------------------------------------------------------------

This is obviously a fairly contrived example, but it's based on an issue
I had with an actual schema and query in a pet project of mine. This is
the schema:

---------------------------------------------
CREATE TYPE host AS ENUM ('router', 'azure');

CREATE TABLE packets_sent (
    sent_at bigint NOT NULL,
    nonce bigint NOT NULL,
    sent_from host NOT NULL,
    sent_to host NOT NULL,
    length smallint NOT NULL,
    PRIMARY KEY (sent_at, nonce)
);

CREATE TABLE packets_received (
    sent_at bigint NOT NULL,
    nonce bigint NOT NULL,
    received_at bigint NOT NULL,
    PRIMARY KEY (sent_at, nonce, received_at)
);
---------------------------------------------

And the query is for a Grafana dashboard, so I've commented out the
macros that Grafana would fill in and replaced them with example values:

----------------------------------------------------------------------------------------
WITH vars AS (
    SELECT
        CASE right(/* '$interval' */ '1m', 1)
            WHEN 's' THEN 1000000000::bigint
            WHEN 'm' THEN 60000000000::bigint
            WHEN 'h' THEN 3600000000000::bigint
            WHEN 'd' THEN 86400000000000::bigint
        END * left(/* '$interval' */ '1m', -1)::bigint AS interval,
        /* $__unixEpochNanoFrom() */ 1767376959485000000 AS range_start,
        /* $__unixEpochNanoTo() */ 1767380559485000000 AS range_end
), packets AS (
    SELECT
        ps.sent_at,
        ps.sent_from,
        ps.sent_to,
        ps.length,
        count(pr.received_at) AS received_count
    FROM packets_sent ps
        LEFT JOIN packets_received pr ON pr.sent_at = ps.sent_at AND pr.nonce = 
ps.nonce
    GROUP BY ps.sent_at, ps.nonce
)
SELECT time, sent_from, sent_to, (sent - received)::numeric / sent AS loss
FROM (
    SELECT
        sent_at - (sent_at - range_start) % interval AS time,
        sent_from,
        sent_to,
        count(*) AS sent,
        count(CASE WHEN received_count > 0 THEN true END) AS received
    FROM vars, packets
    WHERE range_start <= sent_at AND sent_at < range_end
    GROUP BY GROUPING SETS ((time, sent_from, sent_to), (time))
)
ORDER BY time;
----------------------------------------------------------------------------------------

Here's the plan:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on unnamed_subquery  (cost=8388676.31..8533123.48 rows=3077064 
width=48) (actual time=26257.430..26909.250 rows=180 loops=1)
   Buffers: shared hit=8370 read=413942 dirtied=101, temp read=715579 
written=718408
   ->  GroupAggregate  (cost=8388676.31..8502352.84 rows=3077064 width=32) 
(actual time=26257.426..26909.205 rows=180 loops=1)
         Group Key: ((ps.sent_at - ((ps.sent_at - 
('1767376959485000000'::bigint)) % ('60000000000'::bigint)))), ps.sent_from, 
ps.sent_to
         Group Key: ((ps.sent_at - ((ps.sent_at - 
('1767376959485000000'::bigint)) % ('60000000000'::bigint))))
         Buffers: shared hit=8370 read=413942 dirtied=101, temp read=715579 
written=718408
         ->  Sort  (cost=8388676.31..8397223.15 rows=3418738 width=24) (actual 
time=26256.918..26903.013 rows=72000 loops=1)
               Sort Key: ((ps.sent_at - ((ps.sent_at - 
('1767376959485000000'::bigint)) % ('60000000000'::bigint)))), ps.sent_from, 
ps.sent_to
               Sort Method: external merge  Disk: 2400kB
               Buffers: shared hit=8370 read=413942 dirtied=101, temp 
read=715579 written=718408
               ->  Nested Loop  (cost=3366627.09..7877433.21 rows=3418738 
width=24) (actual time=26196.265..26884.902 rows=72000 loops=1)
                     Join Filter: ((('1767376959485000000'::bigint) <= 
ps.sent_at) AND (ps.sent_at < ('1767380559485000000'::bigint)))
                     Rows Removed by Join Filter: 30696620
                     Buffers: shared hit=8370 read=413942 dirtied=101, temp 
read=715279 written=718107
                     ->  Result  (cost=0.00..0.01 rows=1 width=24) (actual 
time=134.623..134.623 rows=1 loops=1)
                     ->  Finalize GroupAggregate  (cost=3366627.09..7082576.66 
rows=30768640 width=34) (actual time=12079.734..25606.482 rows=30768620 loops=1)
                           Group Key: ps.sent_at, ps.nonce
                           Buffers: shared hit=8370 read=413942 dirtied=101, 
temp read=715279 written=718107
                           ->  Gather Merge  (cost=3366627.09..6582586.26 
rows=25640534 width=32) (actual time=12079.725..20111.145 rows=30768620 loops=1)
                                 Workers Planned: 2
                                 Workers Launched: 2
                                 Buffers: shared hit=8370 read=413942 
dirtied=101, temp read=715279 written=718107
                                 ->  Partial GroupAggregate  
(cost=3365627.07..3622032.41 rows=12820267 width=32) (actual 
time=11226.123..14363.060 rows=10256207 loops=3)
                                       Group Key: ps.sent_at, ps.nonce
                                       Buffers: shared hit=8370 read=413942 
dirtied=101, temp read=715279 written=718107
                                       ->  Sort  (cost=3365627.07..3397677.73 
rows=12820267 width=32) (actual time=11226.085..12172.219 rows=10256207 loops=3)
                                             Sort Key: ps.sent_at, ps.nonce
                                             Sort Method: external sort  Disk: 
465752kB
                                             Buffers: shared hit=8370 
read=413942 dirtied=101, temp read=715279 written=718107
                                             Worker 0:  Sort Method: external 
merge  Disk: 419816kB
                                             Worker 1:  Sort Method: external 
merge  Disk: 419032kB
                                             ->  Parallel Hash Right Join  
(cost=621865.67..1238598.28 rows=12820267 width=32) (actual 
time=5243.465..8175.280 rows=10256207 loops=3)
                                                   Hash Cond: ((pr.sent_at = 
ps.sent_at) AND (pr.nonce = ps.nonce))
                                                   Buffers: shared hit=8256 
read=413942 dirtied=101, temp read=330923 written=333200
                                                   ->  Parallel Seq Scan on 
packets_received pr  (cost=0.00..324115.48 rows=12817548 width=24) (actual 
time=0.069..854.158 rows=10254180 loops=3)
                                                         Buffers: shared 
hit=3837 read=192103 dirtied=47
                                                   ->  Parallel Hash  
(cost=354442.67..354442.67 rows=12820267 width=24) (actual 
time=2666.452..2666.453 rows=10256207 loops=3)
                                                         Buckets: 131072  
Batches: 256  Memory Usage: 7680kB
                                                         Buffers: shared 
hit=4401 read=221839 dirtied=54, temp written=166336
                                                         ->  Parallel Seq Scan 
on packets_sent ps  (cost=0.00..354442.67 rows=12820267 width=24) (actual 
time=71.731..941.723 rows=10256207 loops=3)
                                                               Buffers: shared 
hit=4401 read=221839 dirtied=54
 Planning:
   Buffers: shared hit=24
 Planning Time: 0.291 ms
 JIT:
   Functions: 64
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 2.220 ms (Deform 0.769 ms), Inlining 93.214 ms, 
Optimization 135.933 ms, Emission 120.856 ms, Total 352.223 ms
 Execution Time: 26995.067 ms
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

In this case I ended up fixing the performance issue by replacing the
cross-join with a few separate subqueries, like so:

---------------------------------------------------------------------------------------------------
WITH vars AS (
    SELECT
        CASE right(/* '$interval' */ '1m', 1)
            WHEN 's' THEN 1000000000::bigint
            WHEN 'm' THEN 60000000000::bigint
            WHEN 'h' THEN 3600000000000::bigint
            WHEN 'd' THEN 86400000000000::bigint
        END * left(/* '$interval' */ '1m', -1)::bigint AS interval,
        /* $__unixEpochNanoFrom() */ 1767376959485000000 AS range_start,
        /* $__unixEpochNanoTo() */ 1767380559485000000 AS range_end
), packets AS (
    SELECT
        ps.sent_at,
        ps.sent_from,
        ps.sent_to,
        ps.length,
        count(pr.received_at) AS received_count
    FROM packets_sent ps
        LEFT JOIN packets_received pr ON pr.sent_at = ps.sent_at AND pr.nonce = 
ps.nonce
    GROUP BY ps.sent_at, ps.nonce
)
SELECT time, sent_from, sent_to, (sent - received)::numeric / sent AS loss
FROM (
    SELECT
        sent_at - (sent_at - (SELECT range_start FROM vars)) % (SELECT interval 
FROM vars) AS time,
        sent_from,
        sent_to,
        count(*) AS sent,
        count(CASE WHEN received_count > 0 THEN true END) AS received
    FROM packets
    WHERE (SELECT range_start FROM vars) <= sent_at AND sent_at < (SELECT 
range_end FROM vars)
    GROUP BY GROUPING SETS ((time, sent_from, sent_to), (time))
)
ORDER BY time;
---------------------------------------------------------------------------------------------------

Which changes the plan to this:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=478144.54..478183.52 rows=15590 width=48) (actual 
time=103.934..108.181 rows=180 loops=1)
   Sort Key: unnamed_subquery."time"
   Sort Method: quicksort  Memory: 31kB
   Buffers: shared hit=432210 read=165
   CTE vars
     ->  Result  (cost=0.00..0.01 rows=1 width=24) (actual time=8.944..8.945 
rows=1 loops=1)
   ->  Subquery Scan on unnamed_subquery  (cost=475091.06..477058.82 rows=15590 
width=48) (actual time=103.807..108.147 rows=180 loops=1)
         Buffers: shared hit=432210 read=165
         ->  HashAggregate  (cost=475091.06..476902.92 rows=15590 width=32) 
(actual time=103.802..108.114 rows=180 loops=1)
               Hash Key: (packets.sent_at - ((packets.sent_at - (InitPlan 
2).col1) % (InitPlan 3).col1)), packets.sent_from, packets.sent_to
               Hash Key: (packets.sent_at - ((packets.sent_at - (InitPlan 
2).col1) % (InitPlan 3).col1))
               Batches: 1  Memory Usage: 825kB
               Buffers: shared hit=432210 read=165
               InitPlan 2
                 ->  CTE Scan on vars  (cost=0.00..0.02 rows=1 width=8) (actual 
time=0.002..0.003 rows=1 loops=1)
               InitPlan 3
                 ->  CTE Scan on vars vars_1  (cost=0.00..0.02 rows=1 width=8) 
(actual time=0.002..0.002 rows=1 loops=1)
               InitPlan 4
                 ->  CTE Scan on vars vars_2  (cost=0.00..0.02 rows=1 width=8) 
(actual time=8.955..8.956 rows=1 loops=1)
               InitPlan 5
                 ->  CTE Scan on vars vars_3  (cost=0.00..0.02 rows=1 width=8) 
(actual time=0.003..0.004 rows=1 loops=1)
               ->  Subquery Scan on packets  (cost=1001.15..472782.42 
rows=153904 width=24) (actual time=32.016..92.708 rows=72000 loops=1)
                     Buffers: shared hit=432210 read=165
                     ->  Finalize GroupAggregate  (cost=1001.15..470089.10 
rows=153904 width=34) (actual time=32.008..86.553 rows=72000 loops=1)
                           Group Key: ps.sent_at, ps.nonce
                           Buffers: shared hit=432210 read=165
                           ->  Gather Merge  (cost=1001.15..467588.15 
rows=128254 width=32) (actual time=31.968..72.579 rows=72000 loops=1)
                                 Workers Planned: 2
                                 Workers Launched: 2
                                 Buffers: shared hit=432210 read=165
                                 ->  Partial GroupAggregate  
(cost=1.12..451784.43 rows=64127 width=32) (actual time=3.048..54.674 
rows=24000 loops=3)
                                       Group Key: ps.sent_at, ps.nonce
                                       Buffers: shared hit=432210 read=165
                                       ->  Nested Loop Left Join  
(cost=1.12..450662.21 rows=64127 width=32) (actual time=3.029..48.981 
rows=24000 loops=3)
                                             Buffers: shared hit=432210 read=165
                                             ->  Parallel Index Scan using 
packets_sent_pkey on packets_sent ps  (cost=0.56..206925.89 rows=64127 
width=24) (actual time=2.962..10.029 rows=24000 loops=3)
                                                   Index Cond: ((sent_at >= 
(InitPlan 4).col1) AND (sent_at < (InitPlan 5).col1))
                                                   Buffers: shared hit=72351 
read=26
                                             ->  Index Only Scan using 
packets_received_pkey on packets_received pr  (cost=0.56..3.79 rows=1 width=24) 
(actual time=0.001..0.001 rows=1 loops=72000)
                                                   Index Cond: ((sent_at = 
ps.sent_at) AND (nonce = ps.nonce))
                                                   Heap Fetches: 71993
                                                   Buffers: shared hit=359859 
read=139
 Planning:
   Buffers: shared hit=30
 Planning Time: 0.323 ms
 JIT:
   Functions: 58
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 1.904 ms (Deform 0.365 ms), Inlining 0.000 ms, 
Optimization 0.732 ms, Emission 17.105 ms, Total 19.741 ms
 Execution Time: 109.456 ms
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

So I don't need any help improving the performance. But is this behavior
worth a bug report?

Some environment info:
- Version on which I originally encountered the issue:
  PostgreSQL 17.6 (Debian 17.6-0+deb13u1) on x86_64-pc-linux-gnu, compiled by 
gcc (Debian 14.2.0-19) 14.2.0, 64-bit
  (installed via postgresql Debian package)
- Version on which I tested the minimized examples:
  PostgreSQL 18.1 (Debian 18.1-1.pgdg13+2) on x86_64-pc-linux-gnu, compiled by 
gcc (Debian 14.2.0-19) 14.2.0, 64-bit
  (running via postgres:18.1-trixie Docker image, albeit using Podman)
- OS: Debian Trixie running in a Hyper-V VM
- Kernel: 6.12.57+deb13-amd64
- Filesystem: ext4, 107 GiB available
- CPU: AMD Ryzen 9 5950X 16-Core Processor
- RAM: 16GiB total, 5 GiB free, 14 GiB available
- Disk: Samsung 990 PRO PCIe 4.0 SSD

Thanks!
Riley Broderick

Surprising optimizer behavior with a subquery, a GROUP BY, and a range filter

Reply via email to