Re: [HACKERS] Memory usage during sorting

Tom Lane Sun, 18 Mar 2012 08:25:40 -0700

Jeff Janes <[email protected]> writes:
> On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas <[email protected]> wrote:
>> On Sat, Mar 3, 2012 at 4:15 PM, Jeff Janes <[email protected]> wrote:
>>> Anyway, I think the logtape could use redoing.


> The problem there is that none of the files can be deleted until it
> was entirely read, so you end up with all the data on disk twice.  I
> don't know how often people run their databases so close to the edge
> on disk space that this matters, but someone felt that that extra
> storage was worth avoiding.

Yeah, that was me, and it came out of actual user complaints ten or more
years back.  (It's actually not 2X growth but more like 4X growth
according to the comments in logtape.c, though I no longer remember the
exact reasons why.)  We knew when we put in the logtape logic that we
were trading off speed for space, and we accepted that.  It's possible
that with the growth of hard drive sizes, real-world applications would
no longer care that much about whether the space required to sort is 4X
data size rather than 1X.  Or then again, maybe their data has grown
just as fast and they still care.

> As a desirable side effect, I think it would mean
> that we could dispense with retail palloc and pfree altogether.  We
> could just allocate a big chunk of memory, copy tuples into it until
> it's full, using a pointer to keep track of the next unused byte, and
> then, after writing the run, reset the allocation pointer back to the
> beginning of the buffer.  That would not only avoid the cost of going
> through palloc/pfree, but also the memory overhead imposed by
> bookkeeping and power-of-two rounding.

That would be worthwhile, probably.  The power-of-2 rounding in palloc
is not nice at all for tuplesort's purposes.  We've occasionally talked
about inventing a different memory context type that is less willing to
waste space ... but on the other hand, such an allocator would run
slower, so you're right back to making a speed for space tradeoff.
If the tuples could be deallocated in bulk then that catch-22 goes away.

>> No, it's about reducing the number of comparisons needed to maintain
>> the heap property.

> That sounds very interesting.  I didn't know it was even theoretically
> possible to do that.

So has somebody found a hole in the n log n lower bound on the cost of
comparison-based sorting?  I thought that had been proven pretty
rigorously.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory usage during sorting

Reply via email to