Re: [HACKERS] polyphase merge?

Don Marvick Tue, 03 Feb 2009 22:43:27 -0800

Dear All,

Since nobody replied, I would give it a try. I am going to implement the
merge pattern described in Knuth Page 365 (5.4.9), essentially it is as
follow:
- create initial runs using replacement selection (basically this is as in
the current implementation)
- add enough dummy runs of size 0 so that the number of sorted run minus one
can be divided by k-1 (k is merge fan in)
- repetitively merge k smallest runs until only 1 run left

I am new to postgresql, hence any advice would be appreciated.

Can anybody give me some advice on how it can done?
1. how a run should be represented? should I reuse the tape mechanism? e.g.
1 tape 1 run
- or should I use a temp buffer?

2. How memory should be allocated? I assume I divide the memory equally to k
runs, hence each run will get M/k memory. Each read of a run would be of
size M/k bytes.

3. Prefetching. Then, we can precompute the read sequence of blocks of run
during the entire merge process. Based on this, we know the blocks of run to
be prefetched at a point of time.

3. Is it possible to perform read/write I/O while the merge is being
performed? Hence we overlap I/O and computation.

4. any other issue needs consideration?

Best regards,

Don

On Thu, Jan 29, 2009 at 4:11 PM, Don Marvick <[email protected]> wrote:

> Dear All,
>
> I apologize if this has been discussed before. I have tried to search to
> the mailing list and could not find an exact answer.
>
> Currently, Postgres uses Knuth's Algorithm 5.4.2D to perform external merge
> sort. IMHO the algorithm is designed for tapes.
>
> Why don't the simple text book merge pattern described in Knuth Page 365
> (5.4.9) is used for disk based system? The same merge pattern is also
> described in Ramakrishnan text book and it is optimal if seek time is not
> counted, which of course not the real-world case.
>
> Best regards,
>
> Don
>
>

Re: [HACKERS] polyphase merge?

Reply via email to