Re: [HACKERS] checkpointer continuous flushing

Fabien COELHO Sun, 09 Aug 2015 00:43:29 -0700


Hello Heikki,


Thanks for having a look at the patch.

* I think we should drop the "flush" part of this for now. It's not asclearly beneficial as the sorting part, and adds a great deal more codecomplexity. And it's orthogonal to the sorting patch, so we can deal with itseparately.

I agree that it is orthogonal and that the two features could be indistinct patches. The flush part is the first patch I really submittedbecause it has significant effect on latency, and I was told to mix itwith sorting...

The flushing part really helps to keep "write stalls" under control inmany cases, for instance:


- 400-tps 1-client (or 4 for medium) max 100-ms latency

     options   | percent of late transactions
  flush | sort | tiny | small | medium
    off |  off | 12.0 | 64.28 | 68.6
    off |   on | 11.3 | 22.05 | 22.6
     on |  off |  1.1 | 67.93 | 67.9
     on |   on |  0.6 |  3.24 |  3.1

The "percent of late transactions" is really the fraction of time thedatabase is unreachable because of write stalls... So sort without flushis cleary not enough.

Another thing suggested by Andres is to fsync as early as possible, butthis is not a simple patch because is intermix things which are currentlyin distinct parts of checkpoint processing, so I already decided that thiswould be for another submission.

* Is it really necessary to parallelize the I/O among tablespaces? I can seethe point, but I wonder if it makes any difference in practice.

I think that if someone bothers with tablespace there is no reason to killthem behind her. Without sorting you may hope that tablespaces will betouched randomly enough, but once buffers are sorted you can probably findcases where it would write on one table space and then on the other.


So I think that it really should be kept.

* Is there ever any harm in sorting the buffers? The GUC is useful forbenchmarking, but could we leave it out of the final patch?

I think that the performance show that it is basically always beneficial,so the guc may be left out. However on SSD it is unclear to me whether itis just a loss of time or whether it helps, say with wear-leveling. Maybebest to keep it? Anyway it is definitely needed for testing.

* Do we need to worry about exceeding the 1 GB allocation limit inAllocateCheckpointBufferIds? It's enough got 2 TB of shared_buffers. That's alot, but it's not totally crazy these days that someone might do that. At thevery least, we need to lower the maximum of shared_buffers so that you can'thit that limit.


Yep.

I ripped out the "flushing" part, keeping only the sorting. I refactoredthe logic in BufferSync() a bit. There's now a separate function,nextCheckpointBuffer(), that returns the next buffer ID from the sortedlist. The tablespace-parallelization behaviour in encapsulated there,

I do not understand the new tablespace-parallelization logic: there is notest about the tablespace of the buffer in the selection process... Notethat I did wrote a proof for the one I put, and also did some detailedtesting on the side because I'm always wary of proofs, especially mines:-)

I notice that you assume that table space numbers are always small andcontiguous. Is that a fact? I was feeling more at ease with relying on ahash table to avoid such an assumption.

keeping the code in BufferSync() much simpler. See attached. Needs someminor cleanup and commenting still before committing, and I haven't doneany testing besides a simple "make check".


Hmmm..., just another detail, the patch does not sort:

  + if (checkpoint_sort && num_to_write > 1 && false)

I'll resubmit a patch with only the sorting part, and do the kind ofrestructuring you suggest which is a good thing.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to