Hi, Here're some feedback to the comment:

Simon Riggs wrote:
On Wed, 2007-03-28 at 10:54 +0900, Koichi Suzuki wrote:

As written below, full page write can be
categolized as follows:

1) Needed for crash recovery: first page update after each checkpoint.
This has to be kept in WAL.

2) Needed for archive recovery: page update between pg_start_backup and
pg_stop_backup. This has to be kept in archive log.

3) For log-shipping slave such as pg_standby: no full page writes will
be needed for this purpose.

My proposal deals with 2). So, if we mark each "full_page_write", I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.

I'm very happy with this proposal, though I do still have some points in
detailed areas.

If you accept that 1 & 2 are valid goals, then 1 & 3 or 1, 2 & 3 are
also valid goals, ISTM. i.e. you might choose to use full_page_writes on
the primary and yet would like to see optimised data transfer to the
standby server. In that case, you would need the mark.

Yes, I need the mark. In my proposal, only unmarked full-page-writes, which were written as the first update after a checkpoint, are to be removed offline (pg_compresslog).

- Not sure why we need "full_page_compress", why not just mark them
always? That harms noone. (Did someone else ask for that? If so, keep
No, no one asked to have a separate option. There'll be no bad
influence to do so.  So, if we mark each "full_page_write", I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.

OK, different question: Why would anyone ever set full_page_compress = off?
Why have a parameter that does so little? ISTM this is:

i) one more thing to get wrong

ii) cheaper to mark the block when appropriate than to perform the if()
test each time. That can be done only in the path where backup blocks
are present.

iii) If we mark the blocks every time, it allows us to do an offline WAL
compression. If the blocks aren't marked that option is lost. The bit is
useful information, so we should have it in all cases.

Not only full-page-writes are written as WAL record. In my proposal, both full-page-writes and logical log are written in a WAL record, which will make WAL size slightly bigger (five percent or so). If full_page_compress = off, only a full-page-write will be written in a WAL record. I thought someone will not be happy with this size growth.

I agree to make this mandatory if every body is happy with extra logical log in WAL records with full page writes.

I'd like to have your opinion.

- OTOH I'd like to see an explicit parameter set during recovery since
you're asking the main recovery path to act differently in case a single
bit is set/unset. If you are using that form of recovery, we should say
so explicitly, to keep everybody else safe.
Only one thing I had to do is to create "dummy" full page write to
maintain LSNs. Full page writes are omitted in archive log. We have to
LSNs same as those in the original WAL. In this case, recovery has to
read logical log, not "dummy" full page writes. On the other hand, if
both logical log and "real" full page writes are found in a log record,
the recovery has to use "real" full page writes.

I apologise for not understanding your reply, perhaps my original
request was unclear.

In recovery.conf, I'd like to see a parameter such as

dummy_backup_blocks = off (default) | on

to explicitly indicate to the recovery process that backup blocks are
present, yet they are garbage and should be ignored. Having garbage data
within the system is potentially dangerous and I want to be told by the
user that they were expecting that and its OK to ignore that data.
Otherwise I want to throw informative errors. Maybe it seems OK now, but
the next change to the system may have unintended consequences and it
may not be us making the change. "It's OK the Alien will never escape
from the lab" is the starting premise for many good sci-fi horrors and I
want to watch them, not be in one myself. :-)

We can call it other things, of course. e.g.

So far, we don't need any modification to the recovery and redo functions. They ignore the dummy and apply logical logs. Also, if there are both full page writes and logical log, current recovery selects full page writes to apply.

I agree to introduce this option if 8.3 code introduces any conflict to the current. Or, we could introduce this option for future safety. Do you think we should introduce this option?

If this should be introduced now, what we should do is to check this option when dummy full-page-write appears.

Yes I believe so. As pg_standby does not include any chance to meet
partial writes of pages, I believe you can omit all the full page
writes. Of course, as Tom Lange suggested in
removing full page writes can lose a chance to recover from
partial/inconsisitent writes in the crash of pg_standby. In this case,
we have to import a backup and archive logs (with full page writes
during the backup) to recover. (We have to import them when the file
system crashes anyway). If it's okay, I believe
pg_compresslog/pg_decompresslog can be integrated with pg_standby.

Maybe we can work together to include pg_compresslog/pg_decompresslog in pg_standby.

ISTM there are two options.

I think this option is already possible:

1. Allow pg_decompresslog to operate on a file, replacing it with the
expanded form, like gunzip, so we would do this:
  restore_command = 'pg_standby %f decomp.tmp && pg_decompresslog
decomp.tmp %p'

though the decomp.tmp file would not get properly initialised or cleaned
up when we finish.

whereas this will take additional work

2. Allow pg_standby to write to stdin, so that we can do this:
  restore_command = 'pg_standby %f | pg_decompresslog - %p'

Both seem to work fine. pg_decompresslog will read entire file at the beginning and so both two will be equivallent. To make the second option run quicker, pg_decompresslog needs some modification to read WAL record one after another.

Anyway, could you try to run pg_standby with pg_compresslog and pg_decompresslog?

Additional recomment on page header removal:

I found that it is not simple to keep page header in the compressed archive log. Because we eliminate unmarked full page writes and shift the rest of the WAL file data, it is not simple to keep page header as the page header in the compressed archive log. It is much simpler to remove page header as well and rebuild them. I'd like to keep current implementation in this point.

Any suggestions are welcome.

I'll modify the name of the commands and post it hopefully within 20hours.

With Best Regards;

Koichi Suzuki

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to