On Fri, 2 May 2008, Tom Lane wrote:

The case for varying BLCKSZ is marginal already, and I've seen none at all for varying XLOG_BLCKSZ.

I recall someone on the performance list who felt it useful increase XLOG_BLCKSZ to support a high-write environment with WAL shipping, just to make sending the files over the network more efficient. Can't seem to find a reference in the archives though.

If you look at things like the giant Sun system tests, there was significant tuning getting all the block sizes to line up better with the underlying hardware. I would not be surprised to discover that sort of install gains a bit from slinging WAL files around in larger chunks as well. They're already using small values for commit_delay just to get the typical WAL write to be in larger blocks.

As PostgreSQL makes it way into higher throughput environments, it wouldn't surprise me to discover more of these situations where switching WAL segments every 16MB turns into a bottleneck. Right now, it may only be a few people in the world, but saying "that's big enough" for an allocation of anything usually turns out wrong if you wait long enough.

One real concern I have with making this easier to adjust is that I'd hate to let people pick any old block size with the default wal_sync_method, only to have them later discover they can't turn on any direct I/O write method because they botched the alignment restrictions.

Another issue though is whether it makes sense for XLOG_BLCKSZ to be different from BLCKSZ at all, at least in the default case. They are both the unit of I/O and it's not clear why you'd want different units.

There are lots of people who use completely different physical or logical disk setups for the WAL disk than the regular database. That's going to get even more varied moving forward as SSD starts getting used more, since those devices have a very different set of block size optimization characteristics compared with traditional RAID setups. They prefer smaller blocks to match the underlying flash better, and you don't pay as much of a penalty for writing that way because lining up with the spinning disk isn't important. Someone who put one of DB/WAL on SSD and the other on traditional disk might end up with very different DB/WAL block sizes to match.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Reply via email to