Thanks a lot for your comments/advices. I'd like to write some feedback.
Simon Riggs wrote:
On Tue, 2007-03-27 at 11:52 +0900, Koichi Suzuki wrote:
Here's an update of a code to improve full page writes as proposed in
Update includes some modification for error handling in archiver and
In the previous threads, I posted several evaluation and shown that we
can keep all the full page writes needed for full XLOG crash recovery,
while compressing archive log size considerably better than gzip, with
less CPU consumption. I've found no further objection for this proposal
but still would like to hear comments/opinions/advices.
Looks interesting. I like the small amount of code to do this.
A few thoughts:
- Not sure why we need "full_page_compress", why not just mark them
always? That harms noone. (Did someone else ask for that? If so, keep
No, no one asked to have a separate option. There'll be no bad
influence to do so. As written below, full page write can be
categolized as follows:
1) Needed for crash recovery: first page update after each checkpoint.
This has to be kept in WAL.
2) Needed for archive recovery: page update between pg_start_backup and
pg_stop_backup. This has to be kept in archive log.
3) For log-shipping slave such as pg_standby: no full page writes will
be needed for this purpose.
My proposal deals with 2). So, if we mark each "full_page_write", I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.
- OTOH I'd like to see an explicit parameter set during recovery since
you're asking the main recovery path to act differently in case a single
bit is set/unset. If you are using that form of recovery, we should say
so explicitly, to keep everybody else safe.
Only one thing I had to do is to create "dummy" full page write to
maintain LSNs. Full page writes are omitted in archive log. We have to
LSNs same as those in the original WAL. In this case, recovery has to
read logical log, not "dummy" full page writes. On the other hand, if
both logical log and "real" full page writes are found in a log record,
the recovery has to use "real" full page writes.
- I'd rather mark just the nonremovable blocks. But no real difference
It sound nicer. According to the full page write categories above, we
can mark full page writes as needed in "crash recovery" or "archive
recovery". Please give some feedback to the above full page write
- We definitely don't want an normal elog in a XLogInsert critical
section, especially at DEBUG1 level
I agree. I'll remove elog calls from critical sections.
- diff -c format is easier and the standard
I'll change the diff option.
- pg_logarchive and pg_logrestore should be called by a name that
reflects what they actually do. Possibly pg_compresslog and
pg_decompresslog etc.. I've not reviewed those programs, but:
I wasn't careful to have command names more based on what to be done.
I'll change the command name.
- Not sure why we have to compress away page headers. Touch as little as
you can has always been my thinking with recovery code.
Eliminating page headers gives compression rate slightly better, a
couple of percents. To make code simpler, I'll drop this compression.
- I'm very uncomfortable with touching the LSN. Maybe I misunderstand?
The reason why we need pg_logarchive (or pg_decompresslog) is to
maintain LSN the same as those in the original WAL. For this purpose,
we need "dummy" full page write. I don't like to touch LSN either and
this is the reason why pg_decompresslog need some extra work,
eliminating many other changes in the recovery.
- Have you thought about how pg_standby would integrate with this
option? Can you please?
Yes I believe so. As pg_standby does not include any chance to meet
partial writes of pages, I believe you can omit all the full page
writes. Of course, as Tom Lange suggested in
removing full page writes can lose a chance to recover from
partial/inconsisitent writes in the crash of pg_standby. In this case,
we have to import a backup and archive logs (with full page writes
during the backup) to recover. (We have to import them when the file
system crashes anyway). If it's okay, I believe
pg_compresslog/pg_decompresslog can be integrated with pg_standby.
Maybe we can work together to include pg_compresslog/pg_decompresslog in
- I'll do some docs for this after Freeze, if you'd like. I have some
other changes to make there, so I can do this at the same time.
I'll be looking forward to your writings.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not