From: Tom Lane <[EMAIL PROTECTED]> Sent: Sep 23, 2005 2:15 PM Subject: Re: [PERFORM] Releasing memory during External sorting?

>Mark Lewis <[EMAIL PROTECTED]> writes: >> operations != passes. If you were clever, you could probably write a >> modified bubble-sort algorithm that only made 2 passes. A pass is a >> disk scan, operations are then performed (hopefully in memory) on what >> you read from the disk. So there's no theoretical log N lower-bound on >> the number of disk passes. >Given infinite memory that might be true, but I don't think I believe it >for limited memory. If you have room for K tuples in memory then it's >impossible to perform more than K*N useful comparisons per pass (ie, as >each tuple comes off the disk you can compare it to all the ones >currently in memory; anything more is certainly redundant work). So if >K < logN it's clearly not gonna work. > Actually, it's far better than that. I recall a paper I saw in one of the algorithms journals 15+ years ago that proved that if you knew the range of the data, regardless of what that range was, and had n^2 space, you could sort n items in O(n) time. Turns out that with very modest constraints on the range of the data and substantially less extra space (about the same as you'd need for Replacement Selection + External Merge Sort), you can _still_ sort in O(n) time. >It's possible that you could design an algorithm that works in a fixed >number of passes if you are allowed to assume you can hold O(log N) >tuples in memory --- and in practice that would probably work fine, >if the constant factor implied by the O() isn't too big. But it's not >really solving the general external-sort problem. > If you know nothing about the data to be sorted and must guard against the worst possible edge cases, AKA the classic definition of "the general external sorting problem", then one can't do better than some variant of Replacement Selection + Unbalanced Multiway Merge. OTOH, ITRW things are _not_ like that. We know the range of the data in our DB fields or we can safely assume it to be relatively constrained. This allows us access to much better external sorting algorithms. For example Postman Sort (the 2005 winner of the PennySort benchmark) is basically an IO optimized version of an external Radix Sort. Ron ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster