Pantelis Theodosiou <[email protected]> writes:
> On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <[email protected]> wrote:
>> I was trying to understand what - if any - are the guarantees with
>> regards to ordering for combining queries (UNION/UNION ALL/...).
> No, there is no guarantee. It's just that UNION ALL works this way today
> (preserving the order of the subselects) - and I'm not even sure about
> that, it may not preserve the order in all cases, with different indexes or
> partitioning or a parallel plan, etc.
Yeah, that. You can get a parallelized plan today for UNION ALL:
=# explain analyze select * from foo union all select * from foo;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..208552.05 rows=5120008 width=244) (actual
time=0.652..390.135 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual
time=0.021..228.848 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668
width=244) (actual time=0.453..78.084 rows=853333 loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668
width=244) (actual time=0.024..125.299 rows=1280000 loops=2)
Planning Time: 0.094 ms
Execution Time: 488.352 ms
It's true that in simple non-parallelized cases we'll do the first query
then the second, but SQL doesn't promise that to be true and neither does
Postgres.
>>> If you want ordered output use ORDER BY.
>> I don't see how that could be done. Consider the following:
>> (SELECT id FROM data ORDER BY id)
>> UNION ALL
>> (SELECT id FROM data ORDER BY id DESC);
You do it like this:
=# explain analyze (select * from foo union all select * from foo) order by
unique1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=839258.04..889070.63 rows=4266672 width=244) (actual
time=931.054..1707.780 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=839258.01..844591.35 rows=2133336 width=244) (actual
time=924.821..1132.096 rows=1706667 loops=3)
Sort Key: foo.unique1
Sort Method: external merge Disk: 433552kB
Worker 0: Sort Method: external merge Disk: 423400kB
Worker 1: Sort Method: external merge Disk: 415592kB
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244)
(actual time=0.051..218.476 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668
width=244) (actual time=0.030..79.118 rows=853333 loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68
rows=1066668 width=244) (actual time=0.073..118.566 rows=1280000 loops=2)
Planning Time: 0.109 ms
Execution Time: 1830.390 ms
(13 rows)
The parentheses are actually optional here, if memory serves --- to get
the ORDER BY to be applied inside the second sub-select, you'd have to
use parens as Shay had it.
regards, tom lane