Hello Heikki,

The reason I didn't commit this back then was lack of performance testing. I'm fairly confident that this would be a significant improvement for some workloads, and shouldn't hurt much even in the worst case. But I did only a little testing on my laptop. I think Simon was in favor of just committing it immediately, and

Fabien wanted to see more performance testing before committing.

I confirm. To summarize my opinion:

I think that the 1.5 value somewhere in the patch is much too high for the purpose because it shifts the checkpoint load quite a lot (50% more load at the end of the checkpoint) just for the purpose of avoiding a spike which lasts a few seconds (I think) at the beginning. A much smaller value should be used (1.0 <= factor < 1.1), as it would be much less disruptive and would probably avoid the issue just the same. I recommend not to commit with a 1.5 factor in any case.

Another issue I raised is that the load change occurs both with xlog and time triggered checkpoints, and I'm sure it should be applied in both case.

Another issue is that the patch makes sense when the WAL & relations are on the same disk, but might degrade performance otherwise.

Another point that it interacts potentially with a patch I submitted which has a large impact on performance (order of magnitude better in some cases by sorting & flushing blocks on checkpoints), so it would make sense to check that.

So more testing is definitely needed. A guc would be nice for this purpose, especially to look at different factors.

I was hoping that Digoal would re-ran his original test case, and report back on whether it helps. Fabien had a performance test setup, for testing another patch, but he didn't want to run it to test this patch.

Indeed, I have, but I'm quite behind at the moment, I cannot promise anything. Moreover, I'm not sure I see this "spike" issue in my setting, AFAICR.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to