Pantelis Theodosiou <yperc...@gmail.com> writes: > On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <r...@roji.org> wrote: >> I was trying to understand what - if any - are the guarantees with >> regards to ordering for combining queries (UNION/UNION ALL/...).
> No, there is no guarantee. It's just that UNION ALL works this way today > (preserving the order of the subselects) - and I'm not even sure about > that, it may not preserve the order in all cases, with different indexes or > partitioning or a parallel plan, etc. Yeah, that. You can get a parallelized plan today for UNION ALL: =# explain analyze select * from foo union all select * from foo; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=0.00..208552.05 rows=5120008 width=244) (actual time=0.652..390.135 rows=5120000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.021..228.848 rows=1706667 loops=3) -> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.453..78.084 rows=853333 loops=3) -> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.024..125.299 rows=1280000 loops=2) Planning Time: 0.094 ms Execution Time: 488.352 ms It's true that in simple non-parallelized cases we'll do the first query then the second, but SQL doesn't promise that to be true and neither does Postgres. >>> If you want ordered output use ORDER BY. >> I don't see how that could be done. Consider the following: >> (SELECT id FROM data ORDER BY id) >> UNION ALL >> (SELECT id FROM data ORDER BY id DESC); You do it like this: =# explain analyze (select * from foo union all select * from foo) order by unique1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Gather Merge (cost=839258.04..889070.63 rows=4266672 width=244) (actual time=931.054..1707.780 rows=5120000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=839258.01..844591.35 rows=2133336 width=244) (actual time=924.821..1132.096 rows=1706667 loops=3) Sort Key: foo.unique1 Sort Method: external merge Disk: 433552kB Worker 0: Sort Method: external merge Disk: 423400kB Worker 1: Sort Method: external merge Disk: 415592kB -> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.051..218.476 rows=1706667 loops=3) -> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.030..79.118 rows=853333 loops=3) -> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.073..118.566 rows=1280000 loops=2) Planning Time: 0.109 ms Execution Time: 1830.390 ms (13 rows) The parentheses are actually optional here, if memory serves --- to get the ORDER BY to be applied inside the second sub-select, you'd have to use parens as Shay had it. regards, tom lane