Josh Berkus wrote:
Koichi, Andreas,

1) To deal with partial/inconsisitent write to the data file at crash
recovery, we need full page writes at the first modification to pages
after each checkpoint.   It consumes much of WAL space.

We need to find a way around this someday. Other DBs don't do this; it may be becuase they're less durable, or because they fixed the problem.

Maybe both. Fixing the problem may need some means to detect partial/inconsistent writes to the data files, which may needs additional CPU resource.

I don't think there should be only one setting.   It depend on how
database is operated.   Leaving wal_add_optiomization_info = off default
does not bring any change in WAL and archive log handling.   I
understand some people may not be happy with additional 3% or so
increase in WAL size, especially people who dosn't need archive log at
all.   So I prefer to leave the default off.

Except that, is there any reason to turn this off if we are archiving? Maybe it should just be slaved to archive_command ... if we're not using PITR, it's off, if we are, it's on.

Hmm, this sounds to work. On the other hand, existing users, who are happy with the current archiving and would not like to change current archiving command to pg_compresslog or archive log size will increase a bit. I'd like to hear some more on this.

1) is there any throughput benefit for platforms with fast CPU but
contrained I/O (e.g. 2-drive webservers)?  Any penalty for servers with
plentiful I/O?
I've only run benchmarks with archive process running, because
wal_add_optimization_info=on does not make sense if we don't archive
WAL.   In this situation, total I/O decreases because writes to archive
log decreases.   Because of 3% or so increase in WAL size, there will be
increase in WAL write, but decrease in archive writes makes it up.

Yeah, I was just looking for a way to make this a performance feature. I see now that it can't be. ;-)

As to the performance feature, I tested the patch against 8.3HEAD. With pgbench, throughput was as follows:
Case1. Archiver: cp command, wal_add_optimization_info = off,
Case2. Archiver: pg_compresslog, wal_add_optimization_info = on,
DB Size: 1.65GB, Total transaction:1,000,000

Throughput was:
Case1: 632.69TPS
Case2: 653.10TPS ... 3% gain.

Archive Log Size:
Case1: 1.92GB
Case2: 0.57GB (about 30% of the Case1)... Before compression, the size was 1.92GB. Because this is based on the number of WAL segment file size, there will be at most 16MB error in the measurement. If we count this, the increase in WAL I/O will be less than 1%.

3) How is this better than command-line compression for log-shipping? e.g. why do we need it in the database?
I don't fully understand what command-line compression means.   Simon
suggested that this patch can be used with log-shipping and I agree.
If we compare compression with gzip or other general purpose
compression, compression ratio, CPU usage and I/O by pg_compresslog are
all quite better than those in gzip.

OK, that answered my question.

This is why I don't like Josh's suggested name of wal_compressable
WAL is compressable eighter way, only pg_compresslog would need to be
more complex if you don't turn off the full page optimization. I think a
good name would tell that you are turning off an optimization.
(thus my wal_fullpage_optimization on/off)

Well, as a PG hacker I find the name wal_fullpage_optimization quite baffling and I think our general user base will find it even more so. Now that I have Koichi's explanation of the problem, I vote for simply slaving this to the PITR settings and not having a separate option at all.

Could I have more specific suggestion on this?


Koichi Suzuki

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at


Reply via email to