Re: [HACKERS] [PERFORM] A Better External Sort?

Dann Corbit Tue, 04 Oct 2005 23:47:38 -0700

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
> [EMAIL PROTECTED] On Behalf Of Tom Lane
> Sent: Friday, September 30, 2005 11:02 PM
> To: Jeffrey W. Baker
> Cc: Luke Lonergan; Josh Berkus; Ron Peacetree; pgsql-
> [EMAIL PROTECTED]; [email protected]
> Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
> 
> "Jeffrey W. Baker" <[EMAIL PROTECTED]> writes:
> > I think the largest speedup will be to dump the multiphase merge and
> > merge all tapes in one pass, no matter how large M.  Currently M is
> > capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes
over
> > the tape.  It could be done in a single pass heap merge with
N*log(M)
> > comparisons, and, more importantly, far less input and output.
> 
> I had more or less despaired of this thread yielding any usable ideas
> :-( but I think you have one here.


I believe I made the exact same suggestion several days ago.

>The reason the current code uses a
> six-way merge is that Knuth's figure 70 (p. 273 of volume 3 first
> edition) shows that there's not much incremental gain from using more
> tapes ... if you are in the regime where number of runs is much
greater
> than number of tape drives.  But if you can stay in the regime where
> only one merge pass is needed, that is obviously a win.
> 
> I don't believe we can simply legislate that there be only one merge
> pass.  That would mean that, if we end up with N runs after the
initial
> run-forming phase, we need to fit N tuples in memory --- no matter how
> large N is, or how small work_mem is.  But it seems like a good idea
to
> try to use an N-way merge where N is as large as work_mem will allow.
> We'd not have to decide on the value of N until after we've completed
> the run-forming phase, at which time we've already seen every tuple
> once, and so we can compute a safe value for N as work_mem divided by
> largest_tuple_size.  (Tape I/O buffers would have to be counted too
> of course.)

You only need to hold the sort column(s) in memory, except for the queue
you are exhausting at the time.  [And of those columns, only the values
for the smallest one in a sub-list.]  Of course, the more data from each
list that you can hold at once, the fewer the disk reads and seeks.

Another idea (not sure if it is pertinent):
Instead of having a fixed size for the sort buffers, size it to the
query.  Given a total pool of size M, give a percentage according to the
difficulty of the work to perform.  So a query with 3 small columns and
a cardinality of 1000 gets a small percentage and a query with 10 GB of
data gets a big percentage of available sort mem.
 
> It's been a good while since I looked at the sort code, and so I don't
> recall if there are any fundamental reasons for having a compile-time-
> constant value of the merge order rather than choosing it at runtime.
> My guess is that any inefficiencies added by making it variable would
> be well repaid by the potential savings in I/O.
> 
>                       regards, tom lane
> 
> ---------------------------(end of
broadcast)---------------------------
> TIP 6: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [HACKERS] [PERFORM] A Better External Sort?

Reply via email to