On Wed, Aug 30, 2017 at 3:14 PM, Peter Geoghegan <p...@bowt.ie> wrote: > This is significantly faster, in a way that's clearly reproducible and > consistent, despite the fact that we need about 10 merge passes > without replacement selection, and only have enough memory for 7 > tapes. I think that I could find a case that makes replacement > selection look much worse, if I tried.
Just to see where we end up, I quickly wrote a patch to remove replacement selection + replacement_sort_tuples. The LOC impact worked out like this: $ git diff master --stat doc/src/sgml/config.sgml | 39 ---- src/backend/utils/init/globals.c | 1 - src/backend/utils/misc/guc.c | 10 - src/backend/utils/misc/postgresql.conf.sample | 1 - src/backend/utils/sort/tuplesort.c | 403 +++++------------------------------ src/include/miscadmin.h | 1 - src/test/regress/expected/cluster.out | 17 +- src/test/regress/sql/cluster.sql | 14 +- 8 files changed, 52 insertions(+), 434 deletions(-) It was nice to be able to remove comments that make certain distinctions that replacement selection cares about. These were tedious to read. I managed to remove several paragraphs within tuplesort.c's header comments. Another advantage of the changes made that I noticed in passing is that SortTuple.tupindex is now only used while merging. It would be quite possible to remove SortTuple.tupindex entirely, as an additional piece of work, by providing some other indirection to get that information when merging. From there, we could replace SortTuple.tuple with a bitfield, that has one bit for NULLness, and treats other bits as a 63-bit or 31-bit integer. This integer would be used an offset into a buffer that we repalloc(), thus removing all retail palloc()s, even during run generation. Using one big buffer for tuples would make tuplesort.c quite a bit more memory efficient (SortTuple will only be 16 bytes, we won't waste memory on palloc() headers, and sequential access is made more cache efficient in presorted pass-by-reference cases). I'm not planning to work on this myself. It is similar to something that Heikki described last year, but goes a bit further by eliminating all palloc() header overhead, while also not playing tricks with reclaiming bits from pointers (I recall that Tom didn't like that aspect of Heikki's proposal at all). There would be new infrastructure required to make this work -- we'd have to be able to ask routines like ExecCopySlotMinimalTuple() and heap_copytuple() to copy into our own large tuple buffer, rather than doing their own palloc() on tuplesort's behalf. But that seems like a good idea anyway. I may submit the simple patch to remove replacement selection, if other contributors are receptive. Apart from everything else, the "incrementalism" of replacement selection works against cleverer batch memory management of the type I just outlined, which seems like a useful direction to take tuplesort in. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers