Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union

Karsten P Thu, 08 May 2025 03:58:24 -0700

Okay, forget what i've just written, sorry for that.

I've digged deeper and checked my generic example - it works perfectly using
a merge-append in combination with limit.

i'm sorry i didn't check that first. it just won't work in my real-lifeexample.though each part of the query is using an index-scan it is than using a'normal' append

instead of a merge-append, but i don't know why.

here is the expected query plan as used with my generic example:

Limit  (cost=0.86..0.90 rows=1 width=12)
  ->  Merge Append  (cost=0.86..104314.37 rows=3435900 width=12)
        Sort Key: kpkp_orders.buchungsdatum DESC

-> Index Only Scan Backward using kpkp_orders_1 onkpkp_orders (cost=0.43..34977.68 rows=1717950 width=12) -> Index Only Scan Backward using kpkp_invoices_1 onkpkp_invoices (cost=0.43..34977.68 rows=1717950 width=12)


this is fast.

and here is the query plan used on my real-life example:


Limit  (cost=918.40..918.41 rows=1 width=8)
  ->  Sort  (cost=918.40..919.59 rows=473 width=8)
        Sort Key: s.belegdatum DESC
        ->  Subquery Scan on s  (cost=0.56..916.04 rows=473 width=8)
              ->  Append  (cost=0.56..911.31 rows=473 width=327)

-> Subquery Scan on "*SELECT* 1"(cost=0.56..624.08 rows=318 width=187) -> Index Scan usingsoit_erpprozesshistorie_12 on erpprozesshistorie eph (cost=0.56..620.10rows=318 width=181) Index Cond: ((clientid = '0'::numeric)AND (activeflag = 't'::bpchar) AND (prozessuntertyp = 2) AND (CASE WHEN((typ = 1) OR (typ = 7)) THEN 1 ELSE CASE WHEN (typ = 3) THEN 2 ELSECASE WHEN ((typ = 4) OR (typ = 9)) THEN 3 ELSE 4 END END END = 1) AND(CASE WHEN (prozesstyp = 1) THEN true ELSE false END = true) AND(artikelvariante_objid = '1064748816'::numeric)) -> Subquery Scan on "*SELECT* 2"(cost=0.43..284.87 rows=155 width=214) -> Index Scan using soit_umsatz_alt_08 onumsatz_alt (cost=0.43..282.93 rows=155 width=186) Index Cond: ((clientid = '0'::numeric)AND (activeflag = 't'::bpchar) AND (verkauf = true) AND(artikelvariante_objid = '1064748816'::numeric))

so my question is: under wich circumstance does the query-planner use orprefer the 'merge append' over 'append'?



Thanks in advance!


Am 08.05.25 um 11:57 schrieb Karsten P:

Hi,
i've already googled so far but didn't find anything regarding myproblem..
I hope i'm here at the right place.

Following situation (this is just an simplyfied example):

suppose we have two tables, lets say

orders
  - column 'order_number' -> varchar
  - column 'order_date' -> timestamp

with index on order_date

and

invoices
  - column 'invoice_number' -> varchar
  - column 'invoice_date' -> timestamp

with index on invoice_date

and many records in both if them.

now we have a view combining both of them as

create view documents as
(
select order_number as document_number, order_date asdocument_date from orders
    union all select invoice_number, invoice_date from invoices
)


finding the last order placed in the database ist really easy:

  select order_number from orders order by order_date desc limit 1

will result in an index scan backward on orders

same with invoices only...

but when querying the view
select document_number from documents order by document_date desclimit 1
seems to break down to
  - collect all rows from orders
  - combine it with all rows from invoices
  - sort all rows (descending)
  - limit to one row

with many data this is quite slow.
I've tested this with PG9.6 and PG14, it doesn't seem to make adifference (correct me if i'm wrong).
So my question is: What about optimizing the query-planner that if

- a query with unions of selects is executed
- and an 'order by' in combination with 'limit' is applied on thecomplete query (not only on subselects)
- and there is a matching index for each select
the order by and limit - part of the sql is also beeing applied oneach sub-select ?
actually
select document_number from documents order by document_date desclimit 1
is beeing processed as
    select order_number from orders
    union all select invoice_number from invoices
    order by document_number desc
    limit 1

but would it be possible to let the query-optimizer expand the query to
    select order_number from (
(select order_number, order_date from orders order byorder_date desc limit 1) union all (select invoice_number, invoice_date from invoicesorder by invoice_date desc limit 1)
    ) as subselect
   order by order_date desc
   limit 1

as this would use two (or number of unions) index-backward-scans
and than only has to reorder at maximum two rows before limiting tothe first of it?
this should be significantly faster.

thanks a lot and greetz,
Karsten

Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union

Reply via email to