Re: [HACKERS] XLogInsert scaling, revisited

Heikki Linnakangas Tue, 02 Jul 2013 09:50:21 -0700

On 27.06.2013 15:27, Andres Freund wrote:

Hi,


On 2013-06-26 18:52:30 +0300, Heikki Linnakangas wrote:

* Could you document the way slots prevent checkpoints from occurring
   when XLogInsert rechecks for full page writes? I think it's correct -
   but not very obvious on a glance.


There's this in the comment near the top of the file:

  * To update RedoRecPtr or fullPageWrites, one has to make sure that all
  * subsequent inserters see the new value. This is done by reserving all the
  * insertion slots before changing the value. XLogInsert reads RedoRecPtr
and
  * fullPageWrites after grabbing a slot, so by holding all the slots
  * simultaneously, you can ensure that all subsequent inserts see the new
  * value.  Those fields change very seldom, so we prefer to be fast and
  * non-contended when they need to be read, and slow when they're changed.

Does that explain it well enough? XLogInsert holds onto a slot while it
rechecks for full page writes.


Yes. Earlieron that was explained in XLogInsert() - maybe point to the
documentation ontop of the file? The file is too big to expect everyone
to reread the whole file in a new release...

* The read of Insert->RedoRecPtr while rechecking whether it's out of
   date now is unlocked, is that correct? And why?

Same here, XLogInsert holds the slot while rechecking Insert->RedoRecptr.


I was wondering whether its guaranteed that we don't read a cached
value, but I didn't think of the memory barriers due to the spinlock in
Release/AcquireSlot. I think I thought that release wouldn't acquire the
spinlock at all.

* Have you measured whether it works to just keep as many slots as
   max_backends requires around and not bothering with dynamically
   allocating them to inserters?
   That seems to require to keep actually waiting slots in a separate
   list which very well might make that too expensive.

   Correctness wise the biggest problem for that probably is exlusive
   acquiration of all slots CreateCheckpoint() does...


Hmm. It wouldn't be much different, each backend would still need to reserve
its own dedicated slot, because it might be held by the a backend that
grabbed all the slots. Also, bgwriter and checkpointer need to flush the
WAL, so they'd need slots too.

More slots make WaitXLogInsertionsToFinish() more expensive, as it has to
loop through all of them. IIRC some earlier pgbench tests I ran didn't show
much difference in performance, whether there were 40 slots or 100, as long
as there was enough of them. I can run some more tests with even more slots
to see how it behaves.


In a very quick test I ran previously I did see the slot acquiration in
the profile when using short connections that all did some quick inserts
- which isn't surprising since they tend to all start with the same
slot so they will all fight for the first slots.

Maybe we could ammeliorate that by initializing MySlotNo to
(MyProc->pgprocno % num_xloginsert_slots) when uninitialized? That won't
be completely fair since the number of procs won't usually be a multiple
of num_insert_slots, but I doubt that will be problematic.

* What about using some sort of linked list of unused slots for
   WALInsertSlotAcquireOne? Everytime we're done we put it to the *end*
   of the list so it's less likely to have been grabbed by somebody else
   so we can reuse it.
   a) To grab a new one go to the head of the linked list spinlock it and
   recheck whether it's still free. If not, restart. Otherwise, remove
   from list.
   b) To reuse a previously used slot

   That way we only have to do the PGSemaphoreLock() dance if there
   really aren't any free slots.


That adds a spinlock acquisition / release into the critical path, to
protect the linked list. I'm trying very hard to avoid serialization points
like that.


Hm. We already have at least one spinlock in that path? Or are you
thinking of a global one protecting the linked list? If so, I think we
should be able to get away with locking individual slots.
IIRC if you never need to delete list elements you can even get away
with a lockless implementation.

While profiling the tests I've done, finding a free slot hasn't shown up
much. So I don't think it's a problem performance-wise as it is, and I don't
think it would make the code simpler.


It sure wouldn't make it simpler. As I said above, I saw the slot
acquiration in a profile when using -C and a short pgbench script (that
just inserts 10 rows).

* The queuing logic around slots seems to lack documentation. It's
   complex enough to warrant that imo.


Yeah, it's complex. I expanded the comments above XLogInsertSlot, to explain
how xlogInsertingAt works. Did that help, or was it some other part of that
that needs more docs?


What I don't understand so far is why we don't reset xlogInsertingAt
during SlotReleaseOne. There are places documenting that we don't do so,
but not why unless I missed it.

We could reset it in SlotReleaseOne. However, the next inserter thatacquires the slot would then need to set it to a valid value again, inSlotAcquireOne. Leaving it at its previous value is a convenient way tohave it pre-initialized with some value for the next SlotAcquireOne callthat uses the same slot.

Perhaps that should be changed, just to make the logic easier tounderstand. I agree it seems confusing. Another way to initializexlogInsertingAt in SlotAcquireOne would be e.g to use an old valuecached in the same backend. Or just always initialize it to 1, whichwould force any WaitXLogInsertionsToFinish() call to wait for theinserter, until it has finished or updated its value. But maybe thatwouldn't make any difference in practice, and would be less bizarre.

Do we do this only to have some plausible value for a backend that been
acquired but hasn't copied data yet?


Yes, exactly.

If so, why isn't it sufficient to
initialize it in ReserveXLogInsertLocation?

It would be, but then ReserveXLogInsertLocation would need to hold theslot's spinlock at the same time with insertpos_lck, so that it couldatomically read the current CurrBytePos value and copy it toxlogInsertingAt. It's important to keep ReserveXLogInsertLocation() aslightweight as possible, to maximize concurrency.

Unless you initialize it to 1 in SlotAcquireOne - then can update itafter ReserveXLogInsertLocation() at leisure. But then it might not beworth the extra spinlock acquisition to update at all, until you finishthe insertion and release the slot (or move to next page), making thisidentical to the idea that I pondered above.

I'll do some testing with that. If it performs OK, initializingxlogInsertingAt to 1 is clearer than leaving it at its old value.


- Heikki


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] XLogInsert scaling, revisited

Reply via email to