Re: [HACKERS] checkpointer continuous flushing

Fabien COELHO Thu, 07 Jan 2016 13:29:53 -0800


Hello Andres,

Hmmm. What I understood is that the workloads that have some performance
regressions (regressions that I have *not* seen in the many tests I ran) are
not due to checkpointer IOs, but rather in settings where most of the writes
is done by backends or bgwriter.


As far as I can see you've not run many tests where the hot/warm data
set is larger than memory (the full machine's memory, not
shared_buffers).


Indeed, I think I ran some, but not many with such characteristics.

That quite drastically alters the performance characteristics here,because you suddenly have lots of synchronous read IO thrown into themix.


If I understand this point correctly...

I would expect the overall performance to be abysmal in such a situationbecause you get only intermixed *random* read and writes: As you pointout, synchroneous *random* reads (very slow), but on the write side theIOs are mostly random as well on the checkpointer side because there isnot much to aggregate to get sequential writes.

Now why would that degrade performance significantly? For me it shouldrender the sorting/flushing less and less effective, and it would go backto the previous performance levels...

Or maybe it only the flushing itself which degrades performance, as youpoint out, because then you have some synchronous (synced) writes as wellas read, as opposed to just the reads before without the patch.

If this is indeed the issue, then the solution to avoid the regression is*not* to flush so that the OS IO scheduler is less constrained in its job,and can be slightly more effective (well, we talking of abysmal random IOdisk performance here, so effective would be between slightly more or lessvery very very bad).

Maybe a trick could be not to aggregate and flush when buffers in the samefile are too much apart anyway, for instance, based on some threshold?This can be implemented locally when deciding to merge buffer flushes ornot, and whether to flush or not, so it would fit the current code quitesimply.

Now my understanding of the sync_file_range call is that it is an adviceto flush the stuff, but it is still asynchronous in nature, so whether itwould impact performance that badly depends on the OS IO scheduler. Also,I would like to check whether, under the "regressed performance" (in tpsterm that you observed), pg is more or less responsive. It could be thatthe average performance is better but pg is offline longer on fsync. Inwhich case, I would consider it better to have lower tps in such cases*if* pg responsiveness is significantly improved.


Would you have these measures for the regression runs you observed?

Whether it's bgwriter or not I've not fully been able to establish, but
it's a working theory.


Ok, that is something to check for confirmation or infirmation.

Given the above discussion, I think my suggestion may be wrong: as the tpsis low because of random read/write accesses then not many buffers aremodified (so the bgwriter/backends won't need to make space), thecheckpointer does not have much to write (good), *but* all of it is random(bad).

I do not see the point of rewriting the checkpointer for them, although
obviously I agree that something has to be done also for the other
processes.


Rewriting the checkpointer and fixing the flush interface in a more
generic way aren't the same thing at all.

Hmmm, probably I misunderstood something in the discussion. It startedwith an implementation strategy, but it derived to discussing aperformance regression. I aggree that these are two different subjects.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to