On 31.03.2011 11:33, 高增琦 wrote:
Consider a example:
1. delete on two pages, emits two log (1, page1, vm_clear_1), (2, page2,
vm_clear_2)
2. "vm_clear_1" and "vm_clear_2" on same vm page
3. checkpoint, and vm page get torned, vm_clear_2 was lost
4. delete another page, emits one log (3, page1, vm_clear_3), vm_clear_3
still on that vm page
5. power down
6. startup, redo will replay all change after checkpoint, but vm_clear_2
will never be cleared
Am I right?

No. A page can only be torn at a hard crash, ie. at step 5. A checkpoint flushes all changes to disk, once the checkpoint finishes all the changes before it are safe on disk.

If you crashed between step 2 and 3, the VM page might be torn so that only one of the vm_clears has made it to disk but the other has not. But the WAL records for both are on disk anyway, so that will be corrected at replay.

  Another question:
To address the problem in
http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php
, should we just clear the vm before the log of insert/update/delete?
This may reduce the performance, is there another solution?


Yeah, that's a straightforward way to fix it. I don't think the performance
hit will be too bad. But we need to be careful not to hold locks while doing
I/O, which might require some rearrangement of the code. We might want to do
a similar dance that we do in vacuum, and call visibilitymap_pin first, then
lock and update the heap page, and then set the VM bit while holding the
lock on the heap page.

Do you mean we should lock the heap page first, then get the blocknumber,
then release heap page,
then pin the vm's page, then lock both heap page and vm page?
As Robert Haas said, when lock the heap page again, may there isnot enough
free space on it.

I think the sequence would have to be:

1. Pin the heap page.
2. Check if the all-visible flag is set on the heap page (without lock). If it is, pin the vm page
3. Lock heap page, check that it has enough free space
4. Check again if the all-visible flag is set. If it is but we didn't pin the vm page yet, release lock and loop back to step 2
5. Update heap page
6. Update vm page

Is there a way just stop the checkpoint for a while?

Not at the moment. It wouldn't be hard to add, though. I was about to add a mechnism for that last autumn to fix a similar issue with b-tree parent pointer updates (http://archives.postgresql.org/message-id/4ccfee61.2090...@enterprisedb.com), but in the end it was solved differently.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to