Gokulakannan Somasundaram wrote:
> Finally i found some time to look more into the CRC code. The
> time is mostly taken when there is a back-up block in the XLOG structure.
> when it calculates the CRC for the entire block(there is some optimization
> already for the holes), the time is spent on the CRC macro. I tried doing
> some small optimization like changing the int into uint8, thinking that the
> exponentiation operation might get slightly faster, with no success. I don't
> know whether changing the CRC algorithm is a good option.
Yeah, I believe the current implementation is quite well-optimized, so
it's unlikely that you'll find any micro-optimizations like that that
make any significant difference. I'm afraid you're going to need bigger
changes to make it faster. Some ideas that I've had in the past that
might be worth investigating:
* Use a much bigger WAL block size, to avoid splitting each big record,
like one with a backup block. One block / one WAL segment might not be a
* mmap the WAL segments, instead of using the slru buffers. Even if it
didn't help with performance per se, we could get rid of the wal_buffers
setting and let the OS manage the caching.
* Use a cheaper CRC algorithm.
* Calculate CRC when flushing, instead of inserting, the WAL. This would
allow the WAL writer to pick up much the of the CRC work, increasing
throughput on multi-core systems, particularly on bulk loads.
* Cover multiple WAL records with the CRC. Presumably calculating the
CRC of a large contiguous chunk cheaper than n smaller chunks.
* Allow buffering of multiple WAL segments, to avoid the WAL flush on
every xlog segment switch.
* Reorder WAL records on the fly, so that if you have something like
"full page image + update + update + update + update" all happening on a
single page, we could skip writing the individual update records, and
take a full page image of the page after the last update instead.
(getting all the locking right is quite tricky...)
* Change the locking so that you don't need to hold the WALInsertLock
while memcpying the record to the WAL block, but just to allocate the
slot for it. Wouldn't increase single-process performance, but would
make it more scalable.
I haven't pursued any of these further, so some or all of them might
turn out to be not helpful or not feasible.
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly