On 01/23/2017 01:40 PM, Amit Kapila wrote:
On Mon, Jan 23, 2017 at 3:56 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
On 01/23/2017 09:57 AM, Amit Kapila wrote:

On Mon, Jan 23, 2017 at 1:18 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:

On 01/23/2017 08:30 AM, Amit Kapila wrote:

I think if we can get data for pgbench read-write workload when data
doesn't fit in shared buffers but fit in RAM, that can give us some
indication.  We can try by varying the ratio of shared buffers w.r.t
data.  This should exercise the checksum code both when buffers are
evicted and at next read.  I think it also makes sense to check the
WAL data size for each of those runs.

Yes, I'm thinking that's pretty much the worst case for OLTP-like
because it has to evict buffers from shared buffers, generating a
stream of writes. Doing that on good storage (e.g. PCI-e SSD or possibly
tmpfs) will further limit the storage overhead, making the time spent
computing checksums much more significant. Makes sense?

Yeah, I think that can be helpful with respect to WAL, but for data,
if we are considering the case where everything fits in RAM, then
faster storage might or might not help.

I'm not sure I understand. Why wouldn't faster storage help? It's only a
matter of generating enough dirty buffers (that get evicted from shared
buffers) to saturate the storage.

When the page gets evicted from shared buffer, it is just pushed to
kernel; the real write to disk won't happen until the kernel feels
like it.They are written to storage later when a checkpoint occurs.
So, now if we have fast storage subsystem then it can improve the
writes from kernel to disk, but not sure how much that can help in
improving TPS.

I don't think that's quite true. If the pages are evicted by bgwriter, since 9.6 there's a flush every 512kB. This will also flush data written by backends, of course. But even without the flushing, the OS does not wait with the flush until the very last moment - that'd be a huge I/O spike. Instead, the OS will write the dirty data to disk after 30 seconds, of after accumulating some predefined amount of dirty data.

So the system will generally get into a "stable state" where it writes about the same amount of data to disk on average.


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to