2010/9/1 Boszormenyi Zoltan <z...@cybertec.at>:
> we are experimenting with modifying table partitioning
> so the ORDER BY clause can be pushed down to
> child nodes on the grounds that:
> 1. smaller datasets are faster to sort, e.g. two datasets that almost
>    spill out to disk are faster to sort in memory and later merge them
>    than the union set that spills out to disk, or two larger sets
>    that spill out to disk are faster to sort individually than the
>    union dataset (because of the longer seeks, etc)

For what it's worth this logic doesn't necessarily hold. Sorting two
data sets in memory is only faster if you only need one result set at
a time. If you know the first set contains only records which come
before the second then you can do this but if not then you'll need all
the result sets to be available at the same time which will require
spilling them to disk. If you have to do that then you might as well
do a tapesort anyways.

> 2. individual child nodes can have indexes that produce
>    the sorted output already

This is the big win. Currently Append nodes mean throwing out any
ordering from the child nodes and that's important information to be

> Currently I am abusing the AppendPath node but later I will add a new
> executor node that will merge the sorted input coming from the children.

You realize I already did this a few years ago. Search for "merge
append" on the lists. The hard part -- as usual -- ends up being
figuring out in the planner when to do this optimization and how to
avoid exponential growth in the number of plans considered and the
amount of data retained about each plan.

For what it's worth I disagree with Tom. I think this is a situation
where we need *both* types of solution. Ideally we will be able to use
a plain Append node for cases where we know the relative ordering of
the data in different partitions, but there will always be cases where
the structured partition data doesn't actually match up with the
ordering requested and we'll need to fall back to a merge-append node.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to