Gokulakannan Somasundaram wrote: > Finally i found some time to look more into the CRC code. The > time is mostly taken when there is a back-up block in the XLOG structure. > when it calculates the CRC for the entire block(there is some optimization > already for the holes), the time is spent on the CRC macro. I tried doing > some small optimization like changing the int into uint8, thinking that the > exponentiation operation might get slightly faster, with no success. I don't > know whether changing the CRC algorithm is a good option.
Yeah, I believe the current implementation is quite well-optimized, so it's unlikely that you'll find any micro-optimizations like that that make any significant difference. I'm afraid you're going to need bigger changes to make it faster. Some ideas that I've had in the past that might be worth investigating: * Use a much bigger WAL block size, to avoid splitting each big record, like one with a backup block. One block / one WAL segment might not be a bad idea. * mmap the WAL segments, instead of using the slru buffers. Even if it didn't help with performance per se, we could get rid of the wal_buffers setting and let the OS manage the caching. * Use a cheaper CRC algorithm. * Calculate CRC when flushing, instead of inserting, the WAL. This would allow the WAL writer to pick up much the of the CRC work, increasing throughput on multi-core systems, particularly on bulk loads. * Cover multiple WAL records with the CRC. Presumably calculating the CRC of a large contiguous chunk cheaper than n smaller chunks. * Allow buffering of multiple WAL segments, to avoid the WAL flush on every xlog segment switch. * Reorder WAL records on the fly, so that if you have something like "full page image + update + update + update + update" all happening on a single page, we could skip writing the individual update records, and take a full page image of the page after the last update instead. (getting all the locking right is quite tricky...) * Change the locking so that you don't need to hold the WALInsertLock while memcpying the record to the WAL block, but just to allocate the slot for it. Wouldn't increase single-process performance, but would make it more scalable. I haven't pursued any of these further, so some or all of them might turn out to be not helpful or not feasible. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly