Rethink checkpointer's fsync-request table representation. Instead of having one hash table entry per relation/fork/segment, just have one per relation, and use bitmapsets to represent which specific segments need to be fsync'd. This eliminates the need to scan the whole hash table to implement FORGET_RELATION_FSYNC, which fixes the O(N^2) behavior recently demonstrated by Jeff Janes for cases involving lots of TRUNCATE or DROP TABLE operations during a single checkpoint cycle. Per an idea from Robert Haas.
(FORGET_DATABASE_FSYNC still sucks, but since dropping a database is a pretty expensive operation anyway, we'll live with that.) In passing, improve the delayed-unlink code: remove the pass over the list in mdpreckpt, since it wasn't doing anything for us except supporting a useless Assert in mdpostckpt, and fix mdpostckpt so that it will absorb fsync requests every so often when clearing a large backlog of deletion requests. Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/be86e3dd5b42c33387ae976c014e6276c9439f7f Modified Files -------------- src/backend/storage/smgr/md.c | 437 ++++++++++++++++++++++++----------------- 1 files changed, 256 insertions(+), 181 deletions(-) -- Sent via pgsql-committers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-committers
