On Tue, Sep 6, 2016 at 5:20 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote: > I wrote a quick patch to test that, attached. It seems to improve > performance, at least in this small test case: > > create table lotsofints(i integer); > insert into lotsofints select random() * 1000000000.0 from > generate_series(1, 10000000); > vacuum freeze; > > select count(*) FROM (select * from lotsofints order by i) t; > > On my laptop, with default work_mem=4MB, that select takes 7.8 s on > unpatched master, and 6.2 s with the attached patch.
The benefits have a lot to do with OS read-ahead, and efficient use of memory bandwidth during the merge, where we want to access the caller tuples sequentially per tape (i.e. that's what the batch memory stuff added -- it also made much better use of available memory). Note that I've been benchmarking the parallel CREATE INDEX patch on a server with many HDDs, since sequential performance is mostly all that matters. I think that in 1999, the preloading had a lot more to do with logtape.c's ability to aggressively recycle blocks during merges, such that the total storage overhead does not exceed the original size of the caller tuples (plus what it calls "trivial bookkeeping overhead" IIRC). That's less important these days, but still matters some (it's more of an issue when you can't complete the sort in one pass, which is rare these days). Offhand, I would think that taken together this is very important. I'd certainly want to see cases in the hundreds of megabytes or gigabytes of work_mem alongside your 4MB case, even just to be able to talk informally about this. As you know, the default work_mem value is very conservative. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers