[HACKERS] Piggybacking vacuum I/O

Heikki Linnakangas Mon, 22 Jan 2007 06:52:28 -0800

I've been looking at the way we do vacuums.

The fundamental performance issue is that a vacuum generatesnheapblocks+nindexblocks+ndirtyblocks I/Os. Vacuum cost delay helps tospread the cost like part payment, but the total is the same. In an I/Obound system, the extra I/O directly leads to less throughput.

Therefore, we need to do less I/O. Dead space map helps by allowing usto skip blocks that don't need vacuuming, reducing the # of I/Os to2*ndirtyblocks+nindexblocks. That's great, but it doesn't help us if thedead tuples are spread uniformly.

If we could piggyback the vacuum I/Os to the I/Os that we're doinganyway, vacuum wouldn't ideally have to issue any I/O of its own. I'vetried to figure out a way to do that.


Vacuum is done in 3 phases:

1. Scan heap
2. Vacuum index
3. Vacuum heap

Instead of doing a sequential scan, we could perform the 1st phase bywatching the buffer pool, scanning blocks for dead tuples when they'rein memory and keeping track of which pages we've seen. When all pageshave been seen, the tid list is sorted and 1st phase is done.

In theory, the index vacuum could also be done that way, but let'sassume for now that indexes would be scanned like they are currently.

The 3rd phase can be performed similarly to the 1st phase. Whenever apage enters the buffer pool, we check the tid list and remove anymatching tuples from the page. When the list is empty, vacuum is complete.

Of course, there's some issues in the design as described above. Forexample, the vacuum might take a long time if there's cold spots in thetable. In fact, a block full of dead tuples might never be visited again.

A variation of the scheme would be to keep scanning pages that are incache, until the tid list reaches a predefined size, instead of keepingtrack of which pages have already been seen. That would deal better withtables with hot and cold spots, but it couldn't advance the relfrozenidbecause there would be no guarantee that all pages are visited. Also, wecould start 1st phase of the next vacuum, while we're still in the 3rdphase of previous one.

Also, after we've seen 95% of the pages or a timeout expires, we couldfetch the rest of them with random I/O to let the vacuum finish.

I'm not sure how exactly this would be implemented. Perhaps bgwriter orautovacuum would do it, or a new background process. Presumably theprocess would need access to relcache.

One issue is that if we're trying to vacuum every table simultaneouslythis way, we'll need more overall memory for the tid lists. I'm hopingthere's a way to implement this without requiring shared memory for thetid lists, that would make the memory management a nightmare. Also, we'dneed changes to bufmgr API to support this.

This would work nicely with the DSM. The list of pages that need to bevisited in phase 1 could be initialized from the DSM, largely avoidingthe problem with cold spots.


Any thoughts before I start experimenting?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

[HACKERS] Piggybacking vacuum I/O

Reply via email to