Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-10 Thread Zeugswetter Andreas DCP SD
Two pass will create the count of subfiles proportional to: Subfile_count = original_stream_size/sort_memory_buffer_size The merge pass requires (sizeof record * subfile_count) memory. That is true from an algorithmic perspective. But to make the merge efficient you would need to have

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-10 Thread Martijn van Oosterhout
On Fri, Mar 10, 2006 at 09:57:28AM +0100, Zeugswetter Andreas DCP SD wrote: Two pass will create the count of subfiles proportional to: Subfile_count = original_stream_size/sort_memory_buffer_size The merge pass requires (sizeof record * subfile_count) memory. That is true from an

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-10 Thread Zeugswetter Andreas DCP SD
Two pass will create the count of subfiles proportional to: Subfile_count = original_stream_size/sort_memory_buffer_size The merge pass requires (sizeof record * subfile_count) memory. That is true from an algorithmic perspective. But to make the merge efficient you would

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Zeugswetter Andreas DCP SD
This amounts to an assumption that you have infinite work_mem, in which case you hardly need an external sort at all. If your work_mem is in fact finite, then at some point you need more than two passes. I'm not really interested in ripping out support for sort operations that

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-03-08 kell 20:08, kirjutas Jim C. Nasby: But it will take a whole lot of those rewinds to equal the amount of time required by an additional pass through the data. I guess that missing a sector read also implies a rewind, i.e. if you don't process the data read from

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Florian Weimer
* Greg Stark: That's one thing that gives me pause about the current approach of using more tapes. It seems like ideally the user would create a temporary work space on each spindle and the database would arrange to use no more than that number of tapes. Then each merge operation would

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Luke Lonergan
Jim, On 3/9/06 8:35 AM, Jim C. Nasby [EMAIL PROTECTED] wrote: Well, the reality remains though; most folks are unlikely to setup enough dedicated temp areas so that we can do one tape per disk, so it would be really good to have a sort method that didn't rely on that. Agreed - however

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Luke Lonergan
Tom, On 3/9/06 9:44 AM, Tom Lane [EMAIL PROTECTED] wrote: I think this argumentation hinges on some irrational aversion to the word tape. Given adequate work_mem, the CVS-tip behavior is exactly what you propose already (at least for the cases where we don't need random access to the sort

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes: I would only suggest that we replace the existing algorithm with one that will work regardless of (reasonable) memory requirements. Perhaps we can agree that at least 1MB of RAM for external sorting will always be available and proceed from there? If

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote: Luke Lonergan [EMAIL PROTECTED] writes: I would only suggest that we replace the existing algorithm with one that will work regardless of (reasonable) memory requirements. Perhaps we can agree that at least 1MB of RAM for external sorting will always

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Dann Corbit
-Original Message- From: Stephen Frost [mailto:[EMAIL PROTECTED] Sent: Thursday, March 09, 2006 3:49 PM To: Tom Lane Cc: Luke Lonergan; Jim C. Nasby; Greg Stark; Dann Corbit; Simon Riggs; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Merge algorithms for large numbers of tapes

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes: So, if we get a huge performance increase, what's wrong with: if [ sqrt(est(total)) =3D work_mem ]; then two-pass-sort(); else tape-sort(); fi ? Possibly nothing. However, from an algorithmic point of view the CVS-tip code *is* two-pass-sort,

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Luke Lonergan
Stephen, On 3/9/06 3:48 PM, Stephen Frost [EMAIL PROTECTED] wrote: So, if we get a huge performance increase, what's wrong with: if [ sqrt(est(total)) = work_mem ]; then two-pass-sort(); else tape-sort(); fi I have something similar but less complex in mind. One of the observed

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Luke Lonergan
Tom, On 3/9/06 3:59 PM, Tom Lane [EMAIL PROTECTED] wrote: Possibly nothing. However, from an algorithmic point of view the CVS-tip code *is* two-pass-sort, given adequate work_mem and no requirement for random access. Further, the available profile data doesn't show any indication that the

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-09 Thread Luke Lonergan
Dann, On 3/9/06 3:56 PM, Dann Corbit [EMAIL PROTECTED] wrote: Two pass does not require sqrt(total) memory. This figure is clearly wrong. Clearly you haven't read the paper I posted previously in this thread from 1986 written by Jim Grey at Tandem. - Luke ---(end

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Simon Riggs
On Tue, 2006-03-07 at 18:14 -0500, Tom Lane wrote: BTW, I was just looking over Knuth's discussion of sorting again, and realized that there is still something more that could be done within the existing sort framework. We currently use standard polyphase merge (his Algorithm 5.4.2D), which

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: 1. Earlier we had some results that showed that the heapsorts got slower when work_mem was higher and that concerns me most of all right now. Fair enough, but that's completely independent of the merge algorithm. (I don't think the Nyberg results

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Luke Lonergan
Tom, On 3/8/06 7:21 AM, Tom Lane [EMAIL PROTECTED] wrote: Simon Riggs [EMAIL PROTECTED] writes: 1. Earlier we had some results that showed that the heapsorts got slower when work_mem was higher and that concerns me most of all right now. Fair enough, but that's completely independent of

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 07:28:16AM -0800, Luke Lonergan wrote: I think this would be extremely dangerous, as it would encourage processes to take more than their fair share of available resources. I agree - in fact, we currently have no structured concept of fair share of available

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes: If we do have to fail to disk, cut back to 128MB, because having 8x that certainly won't make the sort run anywhere close to 8x faster. Not sure that follows. In particular, the entire point of the recent changes has been to extend the range in which we

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 11:20:50AM -0500, Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: If we do have to fail to disk, cut back to 128MB, because having 8x that certainly won't make the sort run anywhere close to 8x faster. Not sure that follows. In particular, the entire point

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Simon Riggs
On Wed, 2006-03-08 at 10:21 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: 1. Earlier we had some results that showed that the heapsorts got slower when work_mem was higher and that concerns me most of all right now. Fair enough, but that's completely independent of the merge

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Luke Lonergan
Jim, On 3/8/06 9:49 AM, Jim C. Nasby [EMAIL PROTECTED] wrote: On Wed, Mar 08, 2006 at 11:20:50AM -0500, Tom Lane wrote: Not sure that follows. In particular, the entire point of the recent changes has been to extend the range in which we can use a single merge pass --- that is, write the

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes: But do fewer/longer sorted runs translate into not merging back to disk? I thought that was controlled by if we had to be able to rewind the result set. A plain SELECT ... ORDER BY doesn't assume that anymore. It is still required for some cases such as

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Dann Corbit
Riggs; pgsql-hackers@postgreSQL.org Subject: Re: [HACKERS] Merge algorithms for large numbers of tapes Jim C. Nasby [EMAIL PROTECTED] writes: But do fewer/longer sorted runs translate into not merging back to disk? I thought that was controlled by if we had to be able to rewind the result

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Luke Lonergan
Dann, On 3/8/06 12:39 PM, Dann Corbit [EMAIL PROTECTED] wrote: Here are some suggestions of things that I know work really, really well: Can you point to an example? That might help move the discussion along. The reason to interject about the tape goo in this discussion is that we seem to

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Dann Corbit
-Original Message- From: Luke Lonergan [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 08, 2006 1:52 PM To: Dann Corbit; Tom Lane; Jim C. Nasby Cc: Simon Riggs; pgsql-hackers@postgreSQL.org Subject: Re: [HACKERS] Merge algorithms for large numbers of tapes Dann, On 3/8/06 12

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Dann Corbit
To: Luke Lonergan; Tom Lane; Jim C. Nasby Cc: Simon Riggs; pgsql-hackers@postgreSQL.org Subject: Re: [HACKERS] Merge algorithms for large numbers of tapes -Original Message- From: Luke Lonergan [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 08, 2006 1:52 PM To: Dann Corbit; Tom Lane

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Tom Lane
Dann Corbit [EMAIL PROTECTED] writes: Here are some suggestions of things that I know work really, really well: #1. Two pass merge (none of that silly poly-tape merge goo) This amounts to an assumption that you have infinite work_mem, in which case you hardly need an external sort at all. If

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Dann Corbit
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 08, 2006 3:17 PM To: Dann Corbit Cc: Jim C. Nasby; Luke Lonergan; Simon Riggs; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Merge algorithms for large numbers of tapes Dann Corbit [EMAIL

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Greg Stark
Luke Lonergan [EMAIL PROTECTED] writes: I am pretty sure from this thread that PostgreSQL is not doing #1, and I have no idea if it is doing #2. Yep. Even Knuth says that the tape goo is only interesting from a historical perspective and may not be relevant in an era of disk drives. As

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Dann Corbit
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 08, 2006 3:56 PM To: Luke Lonergan Cc: Dann Corbit; Tom Lane; Jim C. Nasby; Simon Riggs; pgsql- [EMAIL PROTECTED] Subject: Re: [HACKERS] Merge algorithms for large numbers of tapes Luke

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 10:49:16AM -0800, Luke Lonergan wrote: Jim, On 3/8/06 9:49 AM, Jim C. Nasby [EMAIL PROTECTED] wrote: On Wed, Mar 08, 2006 at 11:20:50AM -0500, Tom Lane wrote: Not sure that follows. In particular, the entire point of the recent changes has been to extend the

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Jonah H. Harris
Riggs; pgsql- [EMAIL PROTECTED] Subject: Re: [HACKERS] Merge algorithms for large numbers of tapes Luke Lonergan [EMAIL PROTECTED] writes: I am pretty sure from this thread that PostgreSQL is not doing #1,and I have no idea if it is doing #2. Yep.Even Knuth says that the tape goo is only

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 03:35:53PM -0800, Dann Corbit wrote: I think I did not explain it clearly enough. Suppose that you have a set of rows you need to sort. Instead of loading the whole row into memory, just load the columns (or parts of columns) that are being sorted. I hope that it is

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Dann Corbit
-Original Message- From: Jim C. Nasby [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 08, 2006 5:44 PM To: Dann Corbit Cc: Tom Lane; Luke Lonergan; Simon Riggs; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Merge algorithms for large numbers of tapes On Wed, Mar 08, 2006

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 06:55:59PM -0500, Greg Stark wrote: Luke Lonergan [EMAIL PROTECTED] writes: I am pretty sure from this thread that PostgreSQL is not doing #1, and I have no idea if it is doing #2. Yep. Even Knuth says that the tape goo is only interesting from a

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Andrew Dunstan
Dann Corbit wrote: I do not clearly understand the sorting code in PostgreSQL. If I did have a good grasp of it, I would take a go at improving it. Show me the code (and the benchmarks). Seriously. We see regular discussions on this and similar topics, but I haven't seen a patch that

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-08 Thread Greg Stark
Jim C. Nasby [EMAIL PROTECTED] writes: On Wed, Mar 08, 2006 at 06:55:59PM -0500, Greg Stark wrote: Luke Lonergan [EMAIL PROTECTED] writes: I am pretty sure from this thread that PostgreSQL is not doing #1, and I have no idea if it is doing #2. Yep. Even Knuth says that

[HACKERS] Merge algorithms for large numbers of tapes

2006-03-07 Thread Tom Lane
BTW, I was just looking over Knuth's discussion of sorting again, and realized that there is still something more that could be done within the existing sort framework. We currently use standard polyphase merge (his Algorithm 5.4.2D), which IIRC I chose because it was simple and for relatively

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-07 Thread Jonah H. Harris
On 3/7/06, Tom Lane [EMAIL PROTECTED] wrote: BTW, I was just looking over Knuth's discussion of sorting again, andrealized that there is still something more that could be done withinthe existing sort framework.We currently use standard polyphase merge (his Algorithm 5.4.2D), which IIRC I chose

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-07 Thread Dann Corbit
-files, it still only takes a total of two read-write passes. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Jonah H. Harris Sent: Tuesday, March 07, 2006 5:28 PM To: Tom Lane Cc: Simon Riggs; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Merge algorithms for large

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-07 Thread Luke Lonergan
Tom, fewer passes when T is large. Do you want to try that? Two passes is the state-of-the-practice on external disk sorts. If we¹re looking to replace the tape sort approach, I would hope for a two pass approach, with the merge pass avoided in the case of unidirectional access. - Luke

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-07 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes: Two passes is the state-of-the-practice on external disk sorts. There is no such thing as a fixed number of passes regardless of available memory and size of the data. regards, tom lane ---(end of

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-07 Thread Luke Lonergan
Title: Re: [HACKERS] Merge algorithms for large numbers of tapes Yes all of the current best practice external sorts use two passes. A first to produce the runs, which results in S number of files, then a single merge pass across the runs. At most 1 pass across the S runs is required

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-07 Thread Greg Stark
Jonah H. Harris [EMAIL PROTECTED] writes: On 3/7/06, Tom Lane [EMAIL PROTECTED] wrote: However, now that we've changed the code to prefer large numbers of tapes, it's not at all clear that Algorithm D is still the right one to use. In particular I'm looking at cascade merge, Algorithm

Re: [HACKERS] Merge algorithms for large numbers of tapes

2006-03-07 Thread Luke Lonergan
Tom, On 3/7/06 8:03 PM, Tom Lane [EMAIL PROTECTED] wrote: Luke Lonergan [EMAIL PROTECTED] writes: Two passes is the state-of-the-practice on external disk sorts. There is no such thing as a fixed number of passes regardless of available memory and size of the data. While technically