Re: [HACKERS] checkpointer continuous flushing

Fabien COELHO Wed, 09 Sep 2015 12:30:12 -0700


Hello Andres,

Wouldn't it be just as easy to put this logic into the checkpointingcode?
Not sure it would simplify anything, because the checkpointer currently
knows about buffers but flushing is about files, which are hidden from
view.


It'd not really simplify things, but it'd keep it local.

Ok, it would be local, but it would also mean that the checkpointer wouldhave to deal explicitely with files, whereas it currently does not haveto.

I think that the current buffer/file boundary is on engineering principlea good one, so I tried to break it as little as possible to enable thefeature, and I wanted to avoid to have to do a buffer to file translationtwice, once in the checkpointer and once when writing the buffer.

* Wouldn't a binary heap over the tablespaces + progress be nicer?


I'm not sure where it would fit exactly.


Imagine a binaryheap.h style heap over a structure like (tablespaceid,
progress, progress_inc, nextbuf) where the comparator compares the progress.

It would replace what is currently an array. The balancing code needs toenumerate all tablespaces in a round-robin way so as to ensure that alltablespaces are given some attention, otherwise you could have a balanceon two tablespaces but others could be left out. The array makes thisproperty straightforward.

Anyway, I think it would complicate the code significantly (compared to the
straightforward array)


I doubt it. I mean instead of your GetNext you'd just do:
   next_tblspc = DatumGetPointer(binaryheap_first(heap));
   if (next_tblspc == 0)
       return 0;
   next_tblspc.progress += next_tblspc.progress_slice;
   binaryheap_replace_first(PointerGetDatum(next_tblspc));

   return next_tblspc.nextbuf++;

Compare to the array, this tree approach would required ln(Ntablespace)work to extract and reinsert the tablespace under progress, so there is nocomplexity advantage.

Moreover, given that in most cases there are 1 or 2 tablespaces, a treestructure is really on the heavy side.

progress_slice is the number of buffers in the tablespace divided by the
number of total buffers, to avoid doing any sort of expensive math in
the more frequently executed path.

If there are many buffers, I'm not too sure about rounding issues and thelike, so the current approach with a rational seems more secure.

[...] I'm not seing where you'd need an extra pointer?


Indeed, I misunderstood.

[...] What for would you need to bsearch?

To find the tablespace boundaries in the sorted buffer array inlog(NBuffers) * Ntablespace, instead of NBuffers.

I think that the current approach is ok as the number of tablespace
should be small.


Right that's often the case.


Yep.

ISTM that using a tablespace in the sorting would reduce the complexity to
ln(NBuffers) * Ntablespace for finding the boundaries, and then Nbuffers *
(Ntablespace/Ntablespace) = NBuffers for scanning, at the expense of more
memory and code complexity.


Afaics finding the boundaries can be done as part of the enumeration of
tablespaces in BufferSync(). That code needs to be moved, but that's not
too bad. I don't see the code be that much more complicated?

Hmmm. you are proposing to replace prooved and heavilly tested code by amore complex tree data structures distributed quite differently around thesource, and no very clear benefit.

So I would prefer to keep the code as is, that is pretty straightforward,and wait for a strong incentive before doing anything fancier. ISTM thatthere are other places in pg need attention more that further tweakingthis patch.


--
Fabien.


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

Reply via email to