Hello Andres,

Wouldn't it be just as easy to put this logic into the checkpointing code?

Not sure it would simplify anything, because the checkpointer currently
knows about buffers but flushing is about files, which are hidden from
view.

It'd not really simplify things, but it'd keep it local.

Ok, it would be local, but it would also mean that the checkpointer would have to deal explicitely with files, whereas it currently does not have to.

I think that the current buffer/file boundary is on engineering principle a good one, so I tried to break it as little as possible to enable the feature, and I wanted to avoid to have to do a buffer to file translation twice, once in the checkpointer and once when writing the buffer.

* Wouldn't a binary heap over the tablespaces + progress be nicer?

I'm not sure where it would fit exactly.

Imagine a binaryheap.h style heap over a structure like (tablespaceid,
progress, progress_inc, nextbuf) where the comparator compares the progress.

It would replace what is currently an array. The balancing code needs to enumerate all tablespaces in a round-robin way so as to ensure that all tablespaces are given some attention, otherwise you could have a balance on two tablespaces but others could be left out. The array makes this property straightforward.


Anyway, I think it would complicate the code significantly (compared to the
straightforward array)

I doubt it. I mean instead of your GetNext you'd just do:
   next_tblspc = DatumGetPointer(binaryheap_first(heap));
   if (next_tblspc == 0)
       return 0;
   next_tblspc.progress += next_tblspc.progress_slice;
   binaryheap_replace_first(PointerGetDatum(next_tblspc));

   return next_tblspc.nextbuf++;

Compare to the array, this tree approach would required ln(Ntablespace) work to extract and reinsert the tablespace under progress, so there is no complexity advantage.

Moreover, given that in most cases there are 1 or 2 tablespaces, a tree structure is really on the heavy side.

progress_slice is the number of buffers in the tablespace divided by the
number of total buffers, to avoid doing any sort of expensive math in
the more frequently executed path.

If there are many buffers, I'm not too sure about rounding issues and the like, so the current approach with a rational seems more secure.

[...] I'm not seing where you'd need an extra pointer?

Indeed, I misunderstood.

[...] What for would you need to bsearch?

To find the tablespace boundaries in the sorted buffer array in log(NBuffers) * Ntablespace, instead of NBuffers.

I think that the current approach is ok as the number of tablespace
should be small.

Right that's often the case.

Yep.

ISTM that using a tablespace in the sorting would reduce the complexity to
ln(NBuffers) * Ntablespace for finding the boundaries, and then Nbuffers *
(Ntablespace/Ntablespace) = NBuffers for scanning, at the expense of more
memory and code complexity.

Afaics finding the boundaries can be done as part of the enumeration of
tablespaces in BufferSync(). That code needs to be moved, but that's not
too bad. I don't see the code be that much more complicated?

Hmmm. you are proposing to replace prooved and heavilly tested code by a more complex tree data structures distributed quite differently around the source, and no very clear benefit.

So I would prefer to keep the code as is, that is pretty straightforward, and wait for a strong incentive before doing anything fancier. ISTM that there are other places in pg need attention more that further tweaking this patch.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to