"Tom Lane" <[EMAIL PROTECTED]> writes:

"Tom Lane" <[EMAIL PROTECTED]> writes:

> I've been reflecting a bit about whether the notion of deferred fsync
> for transaction commits is really safe.  The proposed patch tries to
> ensure that no consequences of a committed transaction can reach disk
> before the commit WAL record is fsync'd, but ISTM there are potential
> holes in what it's doing.  In particular the path that concerns me is
> (1) transaction A commits with deferred fsync;
> (2) transaction B observes some effect of A (eg, a committed-good tuple);
> (3) transaction B makes a change that is contingent on the observation.
> If B's changes were to reach disk in advance of A's commit record, we'd
> have a risk of logical inconsistency.  The patch is doing what it can
> to prevent *direct* effects of A from reaching disk before the commit
> record does, but it doesn't (and I think cannot) extend this to indirect
> effects perpetrated by other transactions.  An example of the sort of
> risk I'm worried about is a REINDEX omitting an index entry for a tuple
> that it sees as committed dead by A.
> Now this may be safe anyway, but it requires analysis that I don't
> recall anyone having put forward.  The cases that I can see are:

I think Simon did try to put all this in writing when he first proposed it.
It's worth going through again with the actual implementation to be sure all
the same guarantees hold.

> So I think it's probably all OK, but this is a sufficiently long chain
> of reasoning that it had better be checked over by multiple people and
> recorded as part of the design implications of the patch.  Does anyone
> think any of this is wrong, or too fragile to survive future code
> changes?  Are there cases I've missed?

I think the logic you describe is not quite as subtle as you make it out to
be. Certainly it's a bit surprising at first but it all boils down to the
basic idea of how transactions and WAL records work: We never allow any other
transactions to see the effects of our transaction until the commit record is
fsynced to WAL. 

So now we're poking a hole in that but we certainly have to ensure that any
transactions that do see the results of our deferred commit themselves don't
record any visible effects until both their commit and ours hit WAL. The
essential point in Simon's approach that guarantees that is that when you
fsync you fsync all work that came before you. So committing a transaction
also commits all deferred commits that you might depend on.

> BTW: I really dislike the name "transaction guarantee" for the feature;
> it sounds like marketing-speak, not to mention overpromising what we
> can deliver.  Postgres can't "guarantee" anything in the face of
> untrustworthy disk hardware, for instance.  I'd much rather use names
> derived from "deferred commit" or "delayed commit" or some such.

Well from an implementation point of view we're delaying or deferring the
commit. But from a user's point of view the important thing for them to
realize is that a committed record could be lost.

Perhaps we should just not come up with a new name and reuse the fsync
variable. That way users of old installs which have fsync=off silently get
this new behaviour. I'm not sure I like that idea since I use fsync=off to run
cpu overhead tests here. But from a user's point of view it's probably the
"right" thing. This is really what fsync=off should always have been doing.

  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to