Hi,
On 2013-06-26 18:52:30 +0300, Heikki Linnakangas wrote:
* Could you document the way slots prevent checkpoints from occurring
when XLogInsert rechecks for full page writes? I think it's correct -
but not very obvious on a glance.
There's this in the comment near the top of the file:
* To update RedoRecPtr or fullPageWrites, one has to make sure that all
* subsequent inserters see the new value. This is done by reserving all the
* insertion slots before changing the value. XLogInsert reads RedoRecPtr
and
* fullPageWrites after grabbing a slot, so by holding all the slots
* simultaneously, you can ensure that all subsequent inserts see the new
* value. Those fields change very seldom, so we prefer to be fast and
* non-contended when they need to be read, and slow when they're changed.
Does that explain it well enough? XLogInsert holds onto a slot while it
rechecks for full page writes.
Yes. Earlieron that was explained in XLogInsert() - maybe point to the
documentation ontop of the file? The file is too big to expect everyone
to reread the whole file in a new release...
* The read of Insert->RedoRecPtr while rechecking whether it's out of
date now is unlocked, is that correct? And why?
Same here, XLogInsert holds the slot while rechecking Insert->RedoRecptr.
I was wondering whether its guaranteed that we don't read a cached
value, but I didn't think of the memory barriers due to the spinlock in
Release/AcquireSlot. I think I thought that release wouldn't acquire the
spinlock at all.
* Have you measured whether it works to just keep as many slots as
max_backends requires around and not bothering with dynamically
allocating them to inserters?
That seems to require to keep actually waiting slots in a separate
list which very well might make that too expensive.
Correctness wise the biggest problem for that probably is exlusive
acquiration of all slots CreateCheckpoint() does...
Hmm. It wouldn't be much different, each backend would still need to reserve
its own dedicated slot, because it might be held by the a backend that
grabbed all the slots. Also, bgwriter and checkpointer need to flush the
WAL, so they'd need slots too.
More slots make WaitXLogInsertionsToFinish() more expensive, as it has to
loop through all of them. IIRC some earlier pgbench tests I ran didn't show
much difference in performance, whether there were 40 slots or 100, as long
as there was enough of them. I can run some more tests with even more slots
to see how it behaves.
In a very quick test I ran previously I did see the slot acquiration in
the profile when using short connections that all did some quick inserts
- which isn't surprising since they tend to all start with the same
slot so they will all fight for the first slots.
Maybe we could ammeliorate that by initializing MySlotNo to
(MyProc->pgprocno % num_xloginsert_slots) when uninitialized? That won't
be completely fair since the number of procs won't usually be a multiple
of num_insert_slots, but I doubt that will be problematic.
* What about using some sort of linked list of unused slots for
WALInsertSlotAcquireOne? Everytime we're done we put it to the *end*
of the list so it's less likely to have been grabbed by somebody else
so we can reuse it.
a) To grab a new one go to the head of the linked list spinlock it and
recheck whether it's still free. If not, restart. Otherwise, remove
from list.
b) To reuse a previously used slot
That way we only have to do the PGSemaphoreLock() dance if there
really aren't any free slots.
That adds a spinlock acquisition / release into the critical path, to
protect the linked list. I'm trying very hard to avoid serialization points
like that.
Hm. We already have at least one spinlock in that path? Or are you
thinking of a global one protecting the linked list? If so, I think we
should be able to get away with locking individual slots.
IIRC if you never need to delete list elements you can even get away
with a lockless implementation.
While profiling the tests I've done, finding a free slot hasn't shown up
much. So I don't think it's a problem performance-wise as it is, and I don't
think it would make the code simpler.
It sure wouldn't make it simpler. As I said above, I saw the slot
acquiration in a profile when using -C and a short pgbench script (that
just inserts 10 rows).
* The queuing logic around slots seems to lack documentation. It's
complex enough to warrant that imo.
Yeah, it's complex. I expanded the comments above XLogInsertSlot, to explain
how xlogInsertingAt works. Did that help, or was it some other part of that
that needs more docs?
What I don't understand so far is why we don't reset xlogInsertingAt
during SlotReleaseOne. There are places documenting that we don't do so,
but not why unless I missed it.