On 10/03/2014 04:10 PM, Andres Freund wrote:
On 2014-10-03 15:51:37 +0300, Heikki Linnakangas wrote:
After a lot of experimentation, I figured out that the slowdown is already
apparent with the *first* patch, the one that just refactors existing
XLogInsert, without any changes to the WAL format or to the callers. I
fiddled with that for a long time, trying to tease apart the change that
makes the difference, and was finally able to narrow it down.

Attached are two patches. These are both just refactoring, with no changes
to the WAL format or APIs. The first is the same as the refactoring patch I
posted earlier, with only minor changes to #includes, comments and such (per
Alvaro's and Michael's suggestions - thanks!). With the first patch, the
test case I've been using to performance test this becomes somewhere between
5% - 10% slower. Applying the second patch on top of that restores the
performance back to what you get without these patches.

Strange. The second patch moves the CRC calculation from a separate loop
over the XLogRecDatas to the earlier loop, that also iterates through the
XLogRecDatas. The strange thing about this is that when I tried to make the
same change to current git master, without applying the first patch, it
didn't make any difference. The CRC calculation used to integrated to the
earlier loop in 9.1 and before, but in 9.2 it was moved to a separate loop
for simplicity, because it didn't make any difference to performance.

So I now have a refactoring patch ready that I'd like to commit (the
attached two patches together), but to be honest, I have no idea why the
second patch is so essential to performance.

Maybe a stupid question, but did you verify it's not caused by splitting
xloginsert over two translation units and/or not inlining the separate

Hmph, now that I try this again, I don't see any difference with or without the "second" patch - they both perform just as well as unpatched master. So I'm totally baffled again, but I guess the patch is good to go. I need a more stable test environment..

Yeah, I did try merging xlog.c and xloginsert.c into the same .c file at some point, and marking the XLogInsertRecData function as static, allowing the compiler to inline it, but I didn't see any difference. But you have to take that with a grain of salt - I'm apparently not capable of constructing a properly repeatable test case for this.

- Heikki

