Pantelis Theodosiou <yperc...@gmail.com> writes:
> On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <r...@roji.org> wrote:
>> I was trying to understand what - if any - are the guarantees with
>> regards to ordering for combining queries (UNION/UNION ALL/...).

> No, there is no guarantee. It's just that UNION ALL works this way today
> (preserving the order of the subselects) - and I'm not even sure about
> that, it may not preserve the order in all cases, with different indexes or
> partitioning or a parallel plan, etc.

Yeah, that.  You can get a parallelized plan today for UNION ALL:

=# explain analyze select * from foo union all select * from foo;
                                                                 QUERY PLAN     
                                                            
--------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..208552.05 rows=5120008 width=244) (actual 
time=0.652..390.135 rows=5120000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Append  (cost=0.00..208552.05 rows=2133336 width=244) (actual 
time=0.021..228.848 rows=1706667 loops=3)
         ->  Parallel Seq Scan on foo  (cost=0.00..98942.68 rows=1066668 
width=244) (actual time=0.453..78.084 rows=853333 loops=3)
         ->  Parallel Seq Scan on foo foo_1  (cost=0.00..98942.68 rows=1066668 
width=244) (actual time=0.024..125.299 rows=1280000 loops=2)
 Planning Time: 0.094 ms
 Execution Time: 488.352 ms

It's true that in simple non-parallelized cases we'll do the first query
then the second, but SQL doesn't promise that to be true and neither does
Postgres.

>>> If you want ordered output use ORDER BY.

>> I don't see how that could be done. Consider the following:
>> (SELECT id FROM data ORDER BY id)
>> UNION ALL
>> (SELECT id FROM data ORDER BY id DESC);

You do it like this:

=# explain analyze (select * from foo union all select * from foo) order by 
unique1;
                                                                    QUERY PLAN  
                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=839258.04..889070.63 rows=4266672 width=244) (actual 
time=931.054..1707.780 rows=5120000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=839258.01..844591.35 rows=2133336 width=244) (actual 
time=924.821..1132.096 rows=1706667 loops=3)
         Sort Key: foo.unique1
         Sort Method: external merge  Disk: 433552kB
         Worker 0:  Sort Method: external merge  Disk: 423400kB
         Worker 1:  Sort Method: external merge  Disk: 415592kB
         ->  Parallel Append  (cost=0.00..208552.05 rows=2133336 width=244) 
(actual time=0.051..218.476 rows=1706667 loops=3)
               ->  Parallel Seq Scan on foo  (cost=0.00..98942.68 rows=1066668 
width=244) (actual time=0.030..79.118 rows=853333 loops=3)
               ->  Parallel Seq Scan on foo foo_1  (cost=0.00..98942.68 
rows=1066668 width=244) (actual time=0.073..118.566 rows=1280000 loops=2)
 Planning Time: 0.109 ms
 Execution Time: 1830.390 ms
(13 rows)

The parentheses are actually optional here, if memory serves --- to get
the ORDER BY to be applied inside the second sub-select, you'd have to
use parens as Shay had it.

                        regards, tom lane


Reply via email to