On Sun, Dec 28, 2014 at 12:45 PM, Peter Geoghegan <p...@heroku.com> wrote:

> On Sun, Dec 28, 2014 at 12:37 PM, Jeff Davis <pg...@j-davis.com> wrote:
> > Do others have similar numbers? I'm quite surprised at how little
> > work_mem seems to matter for these plans (HashJoin might be a different
> > story though). I feel like I made a mistake -- can someone please do a
> > sanity check on my numbers?
>
> I have seen external sorts that were quicker than internal sorts
> before. With my abbreviated key patch, under certain circumstances
> external sorts are faster, while presumably the same thing is true of
> int4 attribute sorts today. Actually, I saw a 10MB work_mem setting
> that was marginally faster than a multi-gigabyte one that fit the
> entire sort in memory. It probably has something to do with caching
> effects dominating over the expense of more comparisons, since higher
> work_mem settings that still resulted in an external sort were slower
> than the 10MB setting.
>
> I was surprised by this too, but it has been independently reported by
> Jeff Janes.
>

I don't recall (at the moment) seeing our external sort actually faster
than quick-sort, but I've very reliably seen external sorts get faster with
less memory than with more.  It is almost certainly a CPU caching issue.
Very large simple binary heaps are horrible on the CPU cache. And for
sort-by-reference values, quick sort is also pretty bad.

With a slow enough data bus between the CPU and main memory, I don't doubt
that a 'tapesort' with small work_mem could actually be faster than
quicksort with large work_mem.  But I don't recall seeing it myself.  But
I'd be surprised that a tapesort as currently implemented would be faster
than a quicksort if the tapesort is using just one byte less memory than
the quicksort is.

But to Jeff Davis's question, yes, tapesort is not very sensitive to
work_mem, and to the extent it is sensitive it is in the other direction of
more memory being bad.  Once work_mem is so small that it takes multiple
passes over the data to do the merge, then small memory would really be a
problem.  But on modern hardware you have to get pretty absurd settings
before that happens.

Cheers,

Jeff

Reply via email to