Tom Lane wrote:
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
I added a spinlock to protect the signaling fields between bgwriter and
backends. The current non-locking approach gets really difficult as the
patch adds two new flags, and both are more important than the existing
That may be, but you could minimize the cost and notational ugliness by
not using the spinlock where you don't have to. Put the sig_atomic_t
fields back the way they were, and many of the uses of the spinlock go
away. All you really need it for is the places where the additional
flags are set or read.
I find it easier to understand if it's used whenever any of the fields
are accessed. You don't need to read and write ckpt_failed and
ckpt_started/ckpt_done in specific order anymore, for example.
Some other comments:
I tend to agree with whoever said upthread that the combination of GUC
variables proposed here is confusing and ugly. It'd make more sense to
have min and max checkpoint rates in KB/s, with the max checkpoint rate
only honored when we are predicting we'll finish before the next
Really? I thought everyone is happy with the current combination, and
that it was just the old trio of parameters controlling the write, nap
and sync phases that was ugly. One particularly nice thing about
defining the duration as a fraction of checkpoint interval is that we
can come up with a reasonable default value that doesn't depend on your
How would a min and max rate work?
Anyone else have an opinion on the parameters?
The flow of control and responsibility between xlog.c, bgwriter.c and
bufmgr.c seems to have reached the breaking point of unintelligibility.
Can you think of a refactoring that would simplify it? We might have
to resign ourselves to this much messiness, but I'd rather not...
The problem we're trying to solve is doing a checkpoint while running
the normal bgwriter activity at the same time. The normal way to do two
things simultaneously is to have two different processes (or threads). I
thought about having a separate checkpoint process, but I gave up on
that thought because the communication needed between backends, bgwriter
and the checkpointer seems like a mess. The checkpointer would need the
pendingOpsTable so that it can do the fsyncs, and it would also need to
receive the forget-messages to that table. We could move that table
entirely to the checkpointer, but since bgwriter is presumably doing
most of the writes, there would be a lot of chatter between bgwriter and
The current approach is like co-operative multitasking. BufferSyncs
yields control to bgwriter every now and then.
The division of labor between xlog.c and other modules is not that bad,
IMO. CreateCheckPoint is the main entry point to create a checkpoint,
and it calls other modules to do their stuff, including bufmgr.c.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not