Re: [HACKERS] Memory usage during sorting

2012-05-01 Thread Jim Nasby
On 4/17/12 7:19 AM, Greg Stark wrote: On Mon, Apr 16, 2012 at 10:42 PM, Peter Geoghegan wrote: > All but 4 regression tests pass, but they don't really count > as failures, since they're down to an assumption in the tests that the > order certain tuples appear should be the same as our curre

Re: [HACKERS] Memory usage during sorting

2012-04-17 Thread Greg Stark
On Mon, Apr 16, 2012 at 10:42 PM, Peter Geoghegan wrote: > All but 4 regression tests pass, but they don't really count > as failures, since they're down to an assumption in the tests that the > order certain tuples appear should be the same as our current > quicksort implementation returns them,

Re: [HACKERS] Memory usage during sorting

2012-04-16 Thread Peter Geoghegan
On 14 April 2012 14:34, Peter Geoghegan wrote: > FWIW, I started playing with adding timsort to Postgres last night: > > https://github.com/Peter2ndQuadrant/postgres/tree/timsort I've fixed this feature-branch so that every qsort_arg call site (including the tuplesort specialisations thereof) cal

Re: [HACKERS] Memory usage during sorting

2012-04-14 Thread Peter Geoghegan
On 14 April 2012 13:32, Greg Stark wrote: > On Fri, Apr 13, 2012 at 7:01 PM, Peter Geoghegan > wrote: >> Well, timsort is specifically designed to take advantage of pre-sorted >> data. It does appear to have a lot of traction, as wikipedia points >> out: > > I hadn't heard of it. But reading up

Re: [HACKERS] Memory usage during sorting

2012-04-14 Thread Greg Stark
On Fri, Apr 13, 2012 at 7:01 PM, Peter Geoghegan wrote: > Well, timsort is specifically designed to take advantage of pre-sorted > data. It does appear to have a lot of traction, as wikipedia points > out: I hadn't heard of it. But reading up on it it does seem like a good fit for us. It trades s

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 18:33, Robert Haas wrote: > We do use insertion sort for partitions of size < 7.  I assume you are > referring to something else. I stand corrected. My confusion came from the removal of a later *switch* to insertion sort in a3f0b3d68f9a5357a3f72b40a45bcc714a9e0649, which also ad

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Robert Haas
On Fri, Apr 13, 2012 at 1:15 PM, Peter Geoghegan wrote: > On 13 April 2012 17:42, Peter Geoghegan wrote: >> One insight that I had at the time was that text comparisons where the >> c locale isn't used are really rather expensive, and I doubt that >> there is too much that can be done to address

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 17:42, Peter Geoghegan wrote: > One insight that I had at the time was that text comparisons where the > c locale isn't used are really rather expensive, and I doubt that > there is too much that can be done to address that directly.  However, > if we were to support timsort, that

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Peter Geoghegan
On 13 April 2012 16:03, Greg Stark wrote: > Fwiw it also only holds for comparison sorts. If you can sort your > items without actually comparing each item with the others then you > aren't bound by it at all. Notably algorithms like radix sort and > direct sort aren't bound by it and are O(n). I'

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Greg Stark
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas wrote: > On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane wrote: >> So has somebody found a hole in the n log n lower bound on the cost of >> comparison-based sorting?  I thought that had been proven pretty >> rigorously. > > There's not much danger of anyon

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Robert Haas
On Wed, Mar 21, 2012 at 8:57 PM, Jeff Janes wrote: > On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas wrote: > ... >> >> One thing that seems inefficient to me about our current algorithm is >> the use of the run number as a leading column in the sort key. >> There's no real reason why the tuples de

Re: [HACKERS] Memory usage during sorting

2012-04-13 Thread Tom Lane
Jeff Davis writes: > On Sun, 2012-03-18 at 11:25 -0400, Tom Lane wrote: >> Yeah, that was me, and it came out of actual user complaints ten or more >> years back. (It's actually not 2X growth but more like 4X growth >> according to the comments in logtape.c, though I no longer remember the >> exa

Re: [HACKERS] Memory usage during sorting

2012-04-12 Thread Jeff Davis
On Sun, 2012-03-18 at 11:25 -0400, Tom Lane wrote: > Yeah, that was me, and it came out of actual user complaints ten or more > years back. (It's actually not 2X growth but more like 4X growth > according to the comments in logtape.c, though I no longer remember the > exact reasons why.) We knew

Re: [HACKERS] Memory usage during sorting

2012-03-21 Thread Jeff Janes
On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas wrote: ... > > One thing that seems inefficient to me about our current algorithm is > the use of the run number as a leading column in the sort key. > There's no real reason why the tuples destined for the next run need > to be maintained in heap orde

Re: [HACKERS] Memory usage during sorting

2012-03-21 Thread Robert Haas
On Tue, Mar 20, 2012 at 6:16 PM, Greg Stark wrote: > On Tue, Mar 20, 2012 at 8:00 PM, Robert Haas wrote: >>> Frankly that analysis didn't make any sense to me at the time. >>> Comparing integers is fast, sure, but it's still slower than not >>> having to do any comparison at all. >> >> I think yo

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Greg Stark
On Tue, Mar 20, 2012 at 8:00 PM, Robert Haas wrote: >> Frankly that analysis didn't make any sense to me at the time. >> Comparing integers is fast, sure, but it's still slower than not >> having to do any comparison at all. > > I think you're underestimating how much it costs to call the > dataty

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Tom Lane
Robert Haas writes: > Yeah, I think I'm going to try implementing > quicksort-the-whole-batch-and-dump-it-out-as-a-run algorithm, just to > see how good or bad that is compared to what we have now. We may not > end up doing anything that remotely resembles that, in the end, but I > want to see th

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Jim Nasby
On 3/18/12 10:25 AM, Tom Lane wrote: Jeff Janes writes: > On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas wrote: >> On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes wrote: >>> Anyway, I think the logtape could use redoing. > The problem there is that none of the files can be deleted until it >

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Robert Haas
On Tue, Mar 20, 2012 at 3:41 PM, Greg Stark wrote: > On Tue, Mar 20, 2012 at 5:04 PM, Robert Haas wrote: >> No.  It does the opposite: it slows it down.  This is a highly >> surprising result but it's quite repeatable: removing comparisons >> makes it slower.  As previously pontificated, I think

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Greg Stark
On Tue, Mar 20, 2012 at 5:04 PM, Robert Haas wrote: > No.  It does the opposite: it slows it down.  This is a highly > surprising result but it's quite repeatable: removing comparisons > makes it slower.  As previously pontificated, I think this is probably > because the heap can fill up with next

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Robert Haas
On Tue, Mar 20, 2012 at 12:33 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark wrote: >>> Offhand I wonder if this is all because we don't have the O(n) heapify >>> implemented. > >> I'm pretty sure that's not the problem.  Even though our heapify is >> no

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Tom Lane
Robert Haas writes: > On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark wrote: >> Offhand I wonder if this is all because we don't have the O(n) heapify >> implemented. > I'm pretty sure that's not the problem. Even though our heapify is > not as efficient as it could be, it's plenty fast enough. I

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Robert Haas
On Tue, Mar 20, 2012 at 12:12 PM, Tom Lane wrote: > Greg Stark writes: >> http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap > > Interesting.  I'm pretty sure that idea appears nowhere in Knuth > (which might mean it's new enough to have a live patent on it ... > anybody know who invented t

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Robert Haas
On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark wrote: > On Tue, Mar 20, 2012 at 1:57 AM, Tom Lane wrote: >> That was a long time ago, of course, but I have some vague recollection >> that keeping next-run tuples in the current heap achieves a net savings >> in the total number of comparisons needed

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Tom Lane
Greg Stark writes: > http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap Interesting. I'm pretty sure that idea appears nowhere in Knuth (which might mean it's new enough to have a live patent on it ... anybody know who invented this?). But it seems like that should buy back enough compari

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Jeff Janes
On Tue, Mar 20, 2012 at 6:31 AM, Tom Lane wrote: > Greg Stark writes: >> Offhand I wonder if this is all because we don't have the O(n) heapify >> implemented. I think we do already have it implemented. 1/2 the time the tuple stays where it is after one comparison, 1/4 it moves up one level wit

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Greg Stark
http://en.wikipedia.org/wiki/Binary_heap#Building_a_heap -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Tom Lane
Greg Stark writes: > Offhand I wonder if this is all because we don't have the O(n) heapify > implemented. Robert muttered something about that before, but is it real? If you could do that, I'd think you'd have a less-than-n-log-n sorting solution. regards, tom lane --

Re: [HACKERS] Memory usage during sorting

2012-03-20 Thread Greg Stark
On Tue, Mar 20, 2012 at 1:57 AM, Tom Lane wrote: > That was a long time ago, of course, but I have some vague recollection > that keeping next-run tuples in the current heap achieves a net savings > in the total number of comparisons needed to heapify both runs. Offhand I wonder if this is all be

Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Tom Lane
Greg Stark writes: > On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas wrote: >> There's no real reason why the tuples destined for the next run need >> to be maintained in heap order; we could just store them unordered and >> heapify the whole lot of them when it's time to start the next run. > This

Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Greg Stark
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas wrote: > There's no real reason why the tuples destined for the next run need > to be maintained in heap order; we could just store them unordered and > heapify the whole lot of them when it's time to start the next run. This sounded familiar http

Re: [HACKERS] Memory usage during sorting

2012-03-19 Thread Robert Haas
On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane wrote: > So has somebody found a hole in the n log n lower bound on the cost of > comparison-based sorting?  I thought that had been proven pretty > rigorously. There's not much danger of anyone finding a way around that bound since the proof is quite tr

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Jeff Janes
On Sun, Mar 18, 2012 at 8:56 AM, Greg Stark wrote: > On Sun, Mar 18, 2012 at 3:25 PM, Tom Lane wrote: No, it's about reducing the number of comparisons needed to maintain the heap property. >> >>> That sounds very interesting.  I didn't know it was even theoretically >>> possible to do

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Greg Stark
On Sun, Mar 18, 2012 at 3:25 PM, Tom Lane wrote: >>> No, it's about reducing the number of comparisons needed to maintain >>> the heap property. > >> That sounds very interesting.  I didn't know it was even theoretically >> possible to do that. > > So has somebody found a hole in the n log n lower

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Tom Lane
Jeremy Harris writes: > On 2012-03-18 15:25, Tom Lane wrote: >> Yeah, that was me, and it came out of actual user complaints ten or more >> years back. (It's actually not 2X growth but more like 4X growth >> according to the comments in logtape.c, though I no longer remember the >> exact reasons

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Jeremy Harris
On 2012-03-18 15:25, Tom Lane wrote: Jeff Janes writes: On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas wrote: The problem there is that none of the files can be deleted until it was entirely read, so you end up with all the data on disk twice. I don't know how often people run their databases s

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Tom Lane
Jeff Janes writes: > On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas wrote: >> On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes wrote: >>> Anyway, I think the logtape could use redoing. > The problem there is that none of the files can be deleted until it > was entirely read, so you end up with all the

Re: [HACKERS] Memory usage during sorting

2012-03-17 Thread Jeff Janes
On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas wrote: > On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes wrote: > >> Anyway, I think the logtape could use redoing.  When your tapes are >> actually physically tape drives, it is necessary to build up runs one >> after the other on physical tapes, because u

Re: [HACKERS] Memory usage during sorting

2012-03-17 Thread Jeff Janes
On Sat, Mar 17, 2012 at 10:47 AM, Greg Stark wrote: > On Wed, Mar 7, 2012 at 7:55 PM, Robert Haas wrote: >>> But it would mean we have about 1.7x  more runs that need to be merged >>> (for initially random data).  Considering the minimum merge order is >>> 6, that increase in runs is likely not t

Re: [HACKERS] Memory usage during sorting

2012-03-17 Thread Greg Stark
On Wed, Mar 7, 2012 at 7:55 PM, Robert Haas wrote: >> But it would mean we have about 1.7x  more runs that need to be merged >> (for initially random data).  Considering the minimum merge order is >> 6, that increase in runs is likely not to lead to an additional level >> of merging, in which case

Re: [HACKERS] Memory usage during sorting

2012-03-07 Thread Robert Haas
On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes wrote: > The better solution would be to reduce the overhead in the first > place.  While building the initial runs, there is no reason to have 3 > blocks worth of overhead for each tape, when only one tape is ever > being used at a time.  But that change

Re: [HACKERS] Memory usage during sorting

2012-03-03 Thread Jeff Janes
On Sun, Feb 26, 2012 at 7:20 PM, Robert Haas wrote: > On Sat, Feb 25, 2012 at 4:31 PM, Jeff Janes wrote: >>> I'm not sure about the conclusion, but given this discussion, I'm >>> inclined to mark this Returned with Feedback. >> >> OK, thanks.  Does anyone have additional feed-back on how tightly

Re: [HACKERS] Memory usage during sorting

2012-02-26 Thread Tom Lane
Robert Haas writes: > A quick Google search for external sorting algorithms suggest that the > typical way of doing an external sort is to read data until you fill > your in-memory buffer, quicksort it, and dump it out as a run. Repeat > until end-of-data; then, merge the runs (either in a single

Re: [HACKERS] Memory usage during sorting

2012-02-26 Thread Robert Haas
On Sat, Feb 25, 2012 at 4:31 PM, Jeff Janes wrote: >> I'm not sure about the conclusion, but given this discussion, I'm >> inclined to mark this Returned with Feedback. > > OK, thanks.  Does anyone have additional feed-back on how tightly we > wish to manage memory usage?  Is trying to make us use

Re: [HACKERS] Memory usage during sorting

2012-02-25 Thread Jeff Janes
On Tue, Feb 14, 2012 at 1:44 AM, Hitoshi Harada wrote: > On Sat, Feb 11, 2012 at 11:34 AM, Jeff Janes wrote: >> On Wed, Feb 8, 2012 at 1:01 AM, Hitoshi Harada wrote: >>> On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes wrote: The attached patch allows it to reuse that memory.  On my meager

Re: [HACKERS] Memory usage during sorting

2012-02-14 Thread Hitoshi Harada
On Sat, Feb 11, 2012 at 11:34 AM, Jeff Janes wrote: > On Wed, Feb 8, 2012 at 1:01 AM, Hitoshi Harada wrote: >> On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes wrote: >>> >>> The attached patch allows it to reuse that memory.  On my meager >>> system it reduced the building of an index on an integer

Re: [HACKERS] Memory usage during sorting

2012-02-11 Thread Jeff Janes
On Wed, Feb 8, 2012 at 1:01 AM, Hitoshi Harada wrote: > On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes wrote: >> >> The attached patch allows it to reuse that memory.  On my meager >> system it reduced the building of an index on an integer column in a >> skinny 200 million row totally randomly orde

Re: [HACKERS] Memory usage during sorting

2012-02-08 Thread Hitoshi Harada
On Sat, Feb 4, 2012 at 10:06 AM, Jeff Janes wrote: > > The worst thing about the current memory usage is probably that big > machines can't qsort more than 16,777,216 tuples no matter how much > memory they have, because memtuples has to live entirely within a > single allocation, which is current

Re: [HACKERS] Memory usage during sorting

2012-02-08 Thread Hitoshi Harada
On Sun, Jan 15, 2012 at 4:59 PM, Jeff Janes wrote: > > The attached patch allows it to reuse that memory.  On my meager > system it reduced the building of an index on an integer column in a > skinny 200 million row totally randomly ordered table by about 3% from > a baseline of 25 minutes. > Jus

Re: [HACKERS] Memory usage during sorting

2012-02-04 Thread Jeff Janes
On Sat, Feb 4, 2012 at 10:06 AM, Jeff Janes wrote: > Attached is a completely uncommitable proof of concept/work in > progress patch to get around the limitation.  It shows a 2 fold > improvement when indexing an integer column on a 50,000,000 row > randomly ordered table. Oops, forgot to say th

Re: [HACKERS] Memory usage during sorting

2012-02-04 Thread Jeff Janes
On Sat, Jan 21, 2012 at 4:51 PM, Peter Geoghegan wrote: > On 16 January 2012 00:59, Jeff Janes wrote: >> I think it would be better to pre-deduct the tape overhead amount we >> will need if we decide to switch to tape sort from the availMem before >> we even start reading (and then add it back if

Re: [HACKERS] Memory usage during sorting

2012-01-21 Thread Peter Geoghegan
On 16 January 2012 00:59, Jeff Janes wrote: > I think it would be better to pre-deduct the tape overhead amount we > will need if we decide to switch to tape sort from the availMem before > we even start reading (and then add it back if we do indeed make that > switch).  That way we wouldn't over-

[HACKERS] Memory usage during sorting

2012-01-15 Thread Jeff Janes
In tuplesort.c, it initially reads tuples into memory until availMem is exhausted. It then switches to the tape sort algorithm, and allocates buffer space for each tape it will use. This substantially over-runs the allowedMem, and drives availMem negative. It works off this deficit by writing tu