Re: [PERFORM] recovery after long delete

2005-04-14 Thread Tom Lane
Markus Bertheau [EMAIL PROTECTED] writes:
 Now if you pull the plug after 2, at startup, pg will go through the
 in-progress txns and mark them as aborted. That's all the recovery in
 this case. All rows are still there. O(1).

Right.  (Actually it's O(checkpoint interval), because we have to make
sure that everything we did since the last checkpoint actually got to
disk --- but in principle, there's zero recovery effort.)

 How does oracle do that? Has all this something to do with mvcc? Why
 does it take oracle so long to recover?

Oracle doesn't do MVCC the same way we do.  They update rows in place
and put the previous version of a row into an undo log.  If the
transaction aborts, they have to go back through the undo log and put
back the previous version of the row.  I'm not real clear on how that
applies to deletions, but I suppose it's the same deal: cost of undoing
a transaction in Oracle is proportional to the number of rows it
changed.  There's also the little problem that the space available for
UNDO logs is limited :-(

As against which, they don't have to VACUUM.  So it's a tradeoff.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] recovery after long delete

2005-04-14 Thread Greg Stark

Markus Bertheau [EMAIL PROTECTED] writes:

 How does oracle do that? Has all this something to do with mvcc? Why
 does it take oracle so long to recover?

Postgres does pessimistic MVCC where it keeps the old versions where they
are in the table. Only after it's committed can they be cleaned up and reused.
So aborting is a noop but committing requires additional cleanup (which is put
off until vacuum runs).

Oracle does optimistic MVCC where it assumes most transactions will commit
and most transactions will be reading mostly committed data. So it immediately
does all the cleanup for the commit. It stores the old version in separate
storage spaces called the rollback segment and redo logs. Committing is a noop
(almost, there are some details, search for delayed block cleanout) whereas
rolling back requires copying back all that old data from the redo logs back
to the table.

Engineering is all about tradeoffs.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match