[HACKERS] Re: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

2007-08-30 Thread Florian G. Pflug

Gregory Stark wrote:

"Florian G. Pflug" <[EMAIL PROTECTED]> writes:


It seems doable, but it's not pretty. One possible scheme would be to
emit a record *after* chosing a name but *before* creating the file,

No, because the way you know the name is good is a successful
open(O_CREAT).

The idea was to log *twice*. Once the we're about to create a file, and
the second time that we succeeded. That way, the filename shows up in the
log, even if we crash immediatly after physically creating the file, which
gives recovery at least a chance to clean up the mess.


It sounds like if the reason it fails is because someone else created the same
file name you'll delete the wrong file?


Carefull bookkeeping during recovery should be able to eliminate that risk,
I think. I've thought a bit more like this, and came up with the following
idea that also take checkpoints into account.

We keep a global table of (xid, filename) pairs in shared memory. File creation
becomes
  1) Generate a new filename
  2) Add (CurrentTransactionId, filename) to the list, emit a XLOG record
 saying we did this, and flush the log. If the filename is already on
 the list, start over at (1).
  3) Create the file. If this fails, delete the list entry and the file,
 and start over at (1).
  4) On (sub) transaction ABORT, we remove entries with the xids we abort,
 and delete the files.
  5) On top transaction COMMIT, we remove entries with the xids we commit,
 and keep the files.
  6) During top transaction PREPARE, we record the entries with matching xids
 in the 2PC state file.

When creating a checkpoint, we include the global filelist in the checkpoint. We
might need some interlock to ensure that concurrent global filelist updates 
don't get lost - but maybe doing things in the correct order is sufficient to

guarantee this.

During recovery, we track the fate of the files in a similar (but local) list.
 .) We initialize our local tracking list with the one found in the latest
CHECKPOINT.
 .) When we encounter a COMMIT record, we remove all files with xids matching
those in the COMMIT record without deleting them.
 .) When we encounter a PREPARE record, we remove all files with matching xids,
and record them in the 2PC state file. They are deleted if the PREPARED
transaction is aborted.
 .) When we encounter an ABORT record, we remove all files with matching xids
from the list, and delete them.
 .) When we encounter a runtime CHECKPOINT, it's list should match our tracking
list.
 .) When we encounter a shutdown CHECKPOINT, we remove all files from our local
list that are not in the checkpoint's list, and delete those files.

The XLOG flush in step (2) is pretty nasty, but I think any solution that
guarantees to prevent leaks will have to flush something to disk at that
point. The global table isn't too appealing either, because it
will limit how many concurrent transactions will be able to create files. It
could be replaced by some on-disk thing, though.

This solution sounds rather heavy-weight, but I thought I'd share the idea.

Back to work on lazy xid assignment now ;-)

greetings, Florian Pflug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[HACKERS] Re: Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazy XID assignment

2007-08-29 Thread Simon Riggs
On Wed, 2007-08-29 at 19:32 +0200, Florian G. Pflug wrote:

> I propose to do the following in my lazy XID assignment patch 

The lazy XID assignment seems to be the key to unlocking this whole
area.

> - can
> anyone see a hole in that?
> 
> .) Get rid of MyLastRecPtr and MyXactMadeTempRelUpdates. Those are
> superseeded by TransactionIdIsValid(GetCurrentTransactionIdIfAny()).
> .) Get rid of MyXactMadeXLogEntry. Instead, just reset ProcLast
> .) Rename ProcLastRecEnd to XactLastRecEnd, and reset when starting
> a new toplevel transaction.

I followed you up to this point. Nothing bad springs immediately to
mind, but please can you explain the proposals rather than just assert
them and ask us to find the holes? 

> I think we might go even further, and *never* flush the XLOG on abort,
> since if we crash just before the abort won't log anything either. But
> if we leak the leftover files in such a case, that's probably a bad idea.

That doesn't gain us much, yet we lose the ability to tell the
difference between an abort and a crash.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster