Hi,
On 2017-04-07 19:43:39 -0300, Claudio Freire wrote: > On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund <and...@anarazel.de> wrote: > > Hi, > > > > I've *not* read the history of this thread. So I really might be > > missing some context. > > > > > >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001 > >> From: Claudio Freire <klaussfre...@gmail.com> > >> Date: Mon, 12 Sep 2016 23:36:42 -0300 > >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem > >> > >> Turn the dead_tuples array into a structure composed of several > >> exponentially bigger arrays, to enable usage of more than 1GB > >> of work mem during vacuum and thus reduce the number of full > >> index scans necessary to remove all dead tids when the memory is > >> available. > > > >> * We are willing to use at most maintenance_work_mem (or perhaps > >> * autovacuum_work_mem) memory space to keep track of dead tuples. We > >> - * initially allocate an array of TIDs of that size, with an upper limit > >> that > >> + * initially allocate an array of TIDs of 128MB, or an upper limit that > >> * depends on table size (this limit ensures we don't allocate a huge area > >> - * uselessly for vacuuming small tables). If the array threatens to > >> overflow, > >> - * we suspend the heap scan phase and perform a pass of index cleanup and > >> page > >> - * compaction, then resume the heap scan with an empty TID array. > >> + * uselessly for vacuuming small tables). Additional arrays of > >> increasingly > >> + * large sizes are allocated as they become necessary. > >> + * > >> + * The TID array is thus represented as a list of multiple segments of > >> + * varying size, beginning with the initial size of up to 128MB, and > >> growing > >> + * exponentially until the whole budget of > >> (autovacuum_)maintenance_work_mem > >> + * is used up. > > > > When the chunk size is 128MB, I'm a bit unconvinced that using > > exponential growth is worth it. The allocator overhead can't be > > meaningful in comparison to collecting 128MB dead tuples, the potential > > waste is pretty big, and it increases memory fragmentation. > > The exponential strategy is mainly to improve lookup time (ie: to > avoid large segment lists). Well, if we were to do binary search on the segment list, that'd not be necessary. > >> + if (seg->num_dead_tuples >= seg->max_dead_tuples) > >> + { > >> + /* > >> + * The segment is overflowing, so we must allocate a > >> new segment. > >> + * We could have a preallocated segment descriptor > >> already, in > >> + * which case we just reinitialize it, or we may > >> need to repalloc > >> + * the vacrelstats->dead_tuples array. In that case, > >> seg will no > >> + * longer be valid, so we must be careful about > >> that. In any case, > >> + * we must update the last_dead_tuple copy in the > >> overflowing > >> + * segment descriptor. > >> + */ > >> + Assert(seg->num_dead_tuples == seg->max_dead_tuples); > >> + seg->last_dead_tuple = > >> seg->dt_tids[seg->num_dead_tuples - 1]; > >> + if (vacrelstats->dead_tuples.last_seg + 1 >= > >> vacrelstats->dead_tuples.num_segs) > >> + { > >> + int new_num_segs = > >> vacrelstats->dead_tuples.num_segs * 2; > >> + > >> + vacrelstats->dead_tuples.dt_segments = > >> (DeadTuplesSegment *) repalloc( > >> + (void *) > >> vacrelstats->dead_tuples.dt_segments, > >> + > >> new_num_segs * sizeof(DeadTuplesSegment)); > > > > Might be worth breaking this into some sub-statements, it's quite hard > > to read. > > Breaking what precisely? The comment? No, the three-line statement computing the new value of dead_tuples.dt_segments. I'd at least assign dead_tuples to a local variable, to cut the length of the statement down. > >> + while (vacrelstats->dead_tuples.num_segs < > >> new_num_segs) > >> + { > >> + /* Initialize as "unallocated" */ > >> + DeadTuplesSegment *nseg = > >> &(vacrelstats->dead_tuples.dt_segments[ > >> + > >> vacrelstats->dead_tuples.num_segs]); > > > > dito. > > I don't really get what you're asking here. Trying to simplify/shorten the statement. > >> +/* > >> * lazy_tid_reaped() -- is a particular tid deletable? > >> * > >> * This has the right signature to be an > >> IndexBulkDeleteCallback. > >> * > >> - * Assumes dead_tuples array is in sorted order. > >> + * Assumes the dead_tuples multiarray is in sorted order, both > >> + * the segment list and each segment itself, and that all > >> segments' > >> + * last_dead_tuple fields up to date > >> */ > >> static bool > >> lazy_tid_reaped(ItemPointer itemptr, void *state) > > > > Have you done performance evaluation about potential performance > > regressions in big indexes here? IIRC this can be quite frequently > > called? > > Yes, the benchmarks are upthread. The earlier runs were run on my > laptop and made little sense, so I'd ignore them as inaccurate. The > latest run[1] with a pgbench scale of 4000 gave an improvement in CPU > time (ie: faster) of about 20%. Anastasia did another one[2] and saw > improvements as well, roughly 30%, though it's not measuring CPU time > but rather elapsed time. I'd be more concerned about cases that'd already fit into memory, not ones where we avoid doing another scan - and I think you mostly measured that? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers