> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
> Sent: Thursday, September 29, 2005 9:10 AM
> Cc: Pg Hackers; pgsql-performance@postgresql.org
> Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
>       Just to add a little anarchy in your nice debate...
>       Who really needs all the results of a sort on your terabyte
table ?

Reports with ORDER BY/GROUP BY, and many other possibilities.  40% of
mainframe CPU cycles are spent sorting.  That is because huge volumes of
data require lots of energy to be meaningfully categorized.  Let's
suppose that instead of a terabyte of data (or a petabyte or whatever)
we have 10% of it.  That's still a lot of data.

>       I guess not many people do a SELECT from such a table and want
> the
> results. 

What happens when they do?  The cases where it is already fast are not
very important.  The cases where things go into the crapper are the ones
that need attention.

>So, this leaves :
>       - Really wanting all the results, to fetch using a cursor,
>       - CLUSTER type things, where you really want everything in
>       - Aggregates (Sort->GroupAggregate), which might really need to
> the
> whole table.
>       - Complex queries where the whole dataset needs to be examined,
> order
> to return a few values
>       - Joins (again, the whole table is probably not going to be
> selected)
>       - And the ones I forgot.
>       However,
>       Most likely you only want to SELECT N rows, in some ordering :
>       - the first N (ORDER BY x LIMIT N)
>       - last N (ORDER BY x DESC LIMIT N)

For these, the QuickSelect algorithm is what is wanted.  For example:
#include <stdlib.h>
typedef double  Etype;

extern Etype    RandomSelect(Etype * A, size_t p, size_t r, size_t i);
extern size_t   RandRange(size_t a, size_t b);
extern size_t   RandomPartition(Etype * A, size_t p, size_t r);
extern size_t   Partition(Etype * A, size_t p, size_t r);

** In the following code, every reference to CLR means:
**    "Introduction to Algorithms"
**    By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
**    ISBN 0-07-013143-0

** CLR, page 187
Etype           RandomSelect(Etype A[], size_t p, size_t r, size_t i)
    size_t          q,
    if (p == r)
        return A[p];
    q = RandomPartition(A, p, r);
    k = q - p + 1;

    if (i <= k)
        return RandomSelect(A, p, q, i);
        return RandomSelect(A, q + 1, r, i - k);

size_t          RandRange(size_t a, size_t b)
    size_t          c = (size_t) ((double) rand() / ((double) RAND_MAX +
1) * (b - a));
    return c + a;

** CLR, page 162
size_t          RandomPartition(Etype A[], size_t p, size_t r)
    size_t          i = RandRange(p, r);
    Etype           Temp;
    Temp = A[p];
    A[p] = A[i];
    A[i] = Temp;
    return Partition(A, p, r);

** CLR, page 154
size_t          Partition(Etype A[], size_t p, size_t r)
    Etype           x,
    size_t          i,

    x = A[p];
    i = p - 1;
    j = r + 1;

    for (;;) {
        do {
        } while (!(A[j] <= x));
        do {
        } while (!(A[i] >= x));
        if (i < j) {
            temp = A[i];
            A[i] = A[j];
            A[j] = temp;
        } else
            return j;

>       - WHERE x>value ORDER BY x LIMIT N
>       - WHERE x<value ORDER BY x DESC LIMIT N
>       - and other variants
>       Or, you are doing a Merge JOIN against some other table ; in
> case,
> yes, you might need the whole sorted terabyte table, but most likely
> are WHERE clauses in the query that restrict the set, and thus, maybe
> can get some conditions or limit values on the column to sort.

Where clause filters are to be applied AFTER the join operations,
according to the SQL standard.

>       Also the new, optimized hash join, which is more memory
> might
> cover this case.

For == joins.  Not every order by is applied to joins.  And not every
join is an equal join.

>       Point is, sometimes, you only need part of the results of your
> And
> the bigger the sort, the most likely it becomes that you only want
part of
> the results.

That is an assumption that will sometimes be true, and sometimes not.
It is not possible to predict usage patterns for a general purpose
database system.

> So, while we're in the fun hand-waving, new algorithm trying
> mode, why not consider this right from the start ? (I know I'm totally
> hand-waving mode right now, so slap me if needed).
>       I'd say your new, fancy sort algorithm needs a few more input
> :
>       - Range of values that must appear in the final result of the
sort :
>               none, minimum, maximum, both, or even a set of values
from the
> other
> side of the join, hashed, or sorted.

That will already happen (or it certainly ought to)

>       - LIMIT information (first N, last N, none)

That will already happen (or it certainly ought to -- I would be pretty
surprised if it does not happen)

>       - Enhanced Limit information (first/last N values of the second
> column to
> sort, for each value of the first column) (the infamous "top10 by
> category" query)
>       - etc.

All the filters will (at some point) be applied to the data unless they
cannot be applied to the data by formal rule.
>       With this, the amount of data that needs to be kept in memory is
> dramatically reduced, from the whole table (even using your compressed
> keys, that's big) to something more manageable which will be closer to
> size of the final result set which will be returned to the client, and
> avoid a lot of effort.

Sorting the minimal set is a good idea.  Sometimes there is a big
savings there. I would be pretty surprised if a large fraction of data
that does not have to be included is actually processed during the
>       So, this would not be useful in all cases, but when it applies,
> would
> be really useful.

No argument there.  And if an algorithm is being reworked, it is a good
idea to look at things like filtering to see if all filtering that is
allowed by the language standard before the sort takes place is applied.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not

Reply via email to