Re: [HACKERS] vacuum, performance, and MVCC

PFC Sat, 24 Jun 2006 09:58:50 -0700

What I see in this discussion is a huge amount of "the grass must be
greener on the other side" syndrome, and hardly any recognition that
every technique has its downsides and complications.


        Sure ;)

MVCC generates dead rows, by its very nature ; however I see two trendsin this :


        1* A large transaction that updates/deletes many rows.

For instance suppose you UPDATE an entire table whose size is larger thanmemory.

Old row versions have to be kept somewhere until commit, be it in thetable itself or in some accessory undo-log.So, there will be a lot of harddisk grinding anyway, be it MVCC orOracle-style, or whatever. MVCC will bloat the table and indexes, thenVACUUM will shrink them. Update-in-place systems will bloat an undo log.

It seems to me the current MVCC+VACUUM is the right tool for this job,requiring about the same amount of IO that the others.Vacuum scans sequentially, so it's the best way to process large volumesof data.


        2* Many short transactions update rows in a table
        Like the sessions problem, for instance.

        Current VACUUM sucks for this case, I guess that's known.

        -------

So, we have two different problems, and one tool which is adapted to oneproblem only. Should the tool (Vacuum) be fixed to handle both problems,making it more complex and difficult to maintain, or should another toolbe created specifically for the second problem ?Problem 2 is very different from problem 1. The only case when they meetis when there is a continuous stream of small updates running concurrentlywith a long transaction.

        So, what is the ideal tool for case 2 ?

We'd want a vacuuming machine that can be run very often, or even better,continuously.The table can be large, larger than the disk cache, so scanning it is notan option.The updates are probably randomly distributed into the table. Therefore,VACUUMing a long time after these transactions are commited and the pagesare no longer in cache would require a lot of random seeks, which is alsobad.

        Besides, we want this vacuum to be continuous and transparent.

The best time to vacuum pages is, thus, when they are still in thebackground writer's memory, or the disk cache, waiting to be flushed todisk. There, they can be re-read, vacuumed and re-written with no seekpenalty, only additional WAL traffic. However the amount of WAL traffic inbytes/s is less important that the frequency of WAL syncs. Emitting moreWAL data shouldn't be a problem if those sync writes are coalesced withthe sync writes of current reansactions.

So, I guess the best solution for case 2 is to have the background writerperform on-the-fly VACUUM :

An UPDATE or DELETE transaction hands over dirty pages to be written tothe bgwriter. It also tells the bgwriter the ID of the current transactionand flags specifying if they contain candidate dead rows.The bgwriter should have a sufficiently large buffer in order to be ableto keep these pages in memory until all the transactions that can see thedead rows in these pages are finished.

        Then, the pages are vacuumed and written.

The key is the size of the buffer. It should be large enough to containenough pages so that it is actually possible to vacuum something out ofthem before writing them. However if the buffer capacity is exceeded (forinstance, because there is a long running transaction), this is not aproblem : the pages are simply written to disk normally, they will containdead rows, which will need to be handled lated by the standard VACUUM.

I think this would maximize code reuse by using the current bgwriter'sarchitecture... did I miss something ?















---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Re: [HACKERS] vacuum, performance, and MVCC

Reply via email to