Re: [PATCHES] Transaction Guarantee, updated version

2007-07-13 Thread Simon Riggs
Quick progress report:

On Thu, 2007-06-21 at 16:05 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Completed all of the agreed changes for TG:

All comments changed, plus some refactoring of code.

 Can we fix it to be a read test instead of a write test, that is, if
 we know WAL has been flushed through the target LSN, it's safe to set
 the hint bit, else not?

Still working on this, but I'm getting closer.
I've got the main shape of it now, but I need to look at it with fresh
eyes, so another day on this methinks.

I've got an array of LSNs per clog page, as previously described on
hackers, rather than just one per page as before.

 In general, I think a transaction abort should not need to flush
 anything, since the default assumption is that it crashed anyway.
 Hence for instance recording a transaction abort needn't advance
 the LSN of the clog page.  (You seem to have it flushing through
 the last xlog record written by the backend, which is exactly what
 it doesn't need to do.)  By extension, it should be OK to set INVALID
 (aborted) hint bits in a tuple header without any concerns about
 flushing.

Done

 The caching logic in TransactionGetCommitLSN is obviously broken.

Fixed

 Is there really a use-case for adding a pgstat counter for guaranteed
 transactions?  That adds pgstat overhead, and bloats the patch
 noticeably, and I don't entirely see the value of it.

Removed

 There's some padding junk inserted in XLogCtlData, which as far as I
 recall was never discussed, and is certainly not an integral part of the
 delayed-commit feature.  If you want that you should submit and defend
 it separately.

Done

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Transaction Guarantee, updated version

2007-06-22 Thread Simon Riggs
On Thu, 2007-06-21 at 16:05 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Completed all of the agreed changes for TG:
 
 I've just realized that there's a fatal problem with this design.
 We've now got tqual.c setting page LSN when it holds only share lock
 on the buffer.  That will absolutely not work, eg two backends might
 concurrently set different values and end up with garbage (since it's
 unlikely that LSN store is atomic).
 
 Can we fix it to be a read test instead of a write test, that is, if
 we know WAL has been flushed through the target LSN, it's safe to set
 the hint bit, else not?

Yes, that's roughly how it worked in v1.

The various possible strategies for handling hint bits were:
1. flush WAL to current insert pointer - very heavy perf hit
2. defer setting the hint bits at all, if the written transactions are
recently written. To do this we need a cache of recently written
deferred commit Xids.
3. flush WAL up to the LSN of the Xid, if it is a deferred commit. To do
this we need a cache of recently written deferred commit Xids.

Any more?

v1 implemented (2) on the basis that it was a small timing hole that was
best handled by doing as little as possible, though we could easily
implement (3) instead. But you need the deferred commit cache either
way, AFAICS.

So proposal is: Maintain cache of unflushed deferred commit xacts and
their corresponding LSNs. When you need to set a commit bit you check
the required LSN for the xact from cache, then flush WAL up to that LSN.
If xact doesn't exist, then do nothing because it is either an already
flushed deferred commit or a normal commit (therefore already flushed
too). Don't update the page's LSN at all. More code, but it works
correctly and efficiently.

This is pretty much putting back most of v1, though.

 In general, I think a transaction abort should not need to flush
 anything, since the default assumption is that it crashed anyway.

It currently does that though and I haven't sought to change that.

 Hence for instance recording a transaction abort needn't advance
 the LSN of the clog page.  (You seem to have it flushing through
 the last xlog record written by the backend, which is exactly what
 it doesn't need to do.)  By extension, it should be OK to set INVALID
 (aborted) hint bits in a tuple header without any concerns about
 flushing.

Sure.

 Also, I'm sort of wondering if we really need a separate walwriter
 process; that code seems awfully duplicative.  Is there a reason
 not to have the bgwriter include this functionality?

The bgwriter is doing something else already. Fsyncing the WAL takes
significant time and that would detract from the main role of bgwriter.

 In lesser news:
 
 The caching logic in TransactionGetCommitLSN is obviously broken.

OK, but won't check because of what you've said at top of mail. No need
to fix that if the basic premise needs changing anyway.

 Is there really a use-case for adding a pgstat counter for guaranteed
 transactions?  That adds pgstat overhead, and bloats the patch
 noticeably, and I don't entirely see the value of it.

I'm easy on that: its easier to take code out than add it in. It
certainly helped to debug things.

 There's some padding junk inserted in XLogCtlData, which as far as I
 recall was never discussed, and is certainly not an integral part of the
 delayed-commit feature.  If you want that you should submit and defend
 it separately.

OK

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Transaction Guarantee, updated version

2007-06-22 Thread Simon Riggs
On Fri, 2007-06-22 at 10:22 -0400, Tom Lane wrote:

 4. If WAL has not been flushed far enough to be certain that the target
 transaction's commit is synced, then defer setting the hint bit.  This
 does not require any new per-transaction state, we can just measure it
 on the basis of the lsn associated with the clog page that the
 transaction's state is in.

Seems like a great idea, and avoids that annoying DFC cache.

I'll do that.


-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Transaction Guarantee, updated version

2007-06-21 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Completed all of the agreed changes for TG:

I've just realized that there's a fatal problem with this design.
We've now got tqual.c setting page LSN when it holds only share lock
on the buffer.  That will absolutely not work, eg two backends might
concurrently set different values and end up with garbage (since it's
unlikely that LSN store is atomic).

Can we fix it to be a read test instead of a write test, that is, if
we know WAL has been flushed through the target LSN, it's safe to set
the hint bit, else not?

In general, I think a transaction abort should not need to flush
anything, since the default assumption is that it crashed anyway.
Hence for instance recording a transaction abort needn't advance
the LSN of the clog page.  (You seem to have it flushing through
the last xlog record written by the backend, which is exactly what
it doesn't need to do.)  By extension, it should be OK to set INVALID
(aborted) hint bits in a tuple header without any concerns about
flushing.

Also, I'm sort of wondering if we really need a separate walwriter
process; that code seems awfully duplicative.  Is there a reason
not to have the bgwriter include this functionality?


In lesser news:

The caching logic in TransactionGetCommitLSN is obviously broken.

Is there really a use-case for adding a pgstat counter for guaranteed
transactions?  That adds pgstat overhead, and bloats the patch
noticeably, and I don't entirely see the value of it.

There's some padding junk inserted in XLogCtlData, which as far as I
recall was never discussed, and is certainly not an integral part of the
delayed-commit feature.  If you want that you should submit and defend
it separately.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PATCHES] Transaction Guarantee, updated version

2007-06-16 Thread Simon Riggs
Completed all of the agreed changes for TG:
- WAL writer now runs always, wal_writer_delay = 50ms (default)
- when transaction_guarantee = off we record the latest LSN 
- when we set xact hint bits we look at the latest LSN
- added database-level stats to show guaranteed commits
- fsync parameter still present
- removed logic in XLogInsert to test half-full wal buffers
- no DFC cache anymore
- retained LSN pass-down to mark clog to allow WAL-flush-before-write

Passed all of an afternoons testing with pgbench and make check.
make check passes with both settings of transaction_guarantee

Re-eyeballed every line of the patch to catch errors

Operating parameters are:
- transaction_guarantee = on | (off) [USERSET]
Normal transactions are guaranteed to have been written to disk before
we report COMMIT success back to the user. When the transaction
guarantee is relaxed we respond to the user that the transaction has
succeeded and defer the write to disk. The walwriter process will ensure
that any unguaranteed transactions are written to disk after at most two
cycles of the wal_writer_delay. This leaves a short window of data loss
that can occur should the database server crash.

- wal_writer_delay = 50ms (10-1000)

For debug, we have
- trace_commit = off | (on)
- trace_bg_flush = off | (on) - in this patch, defaults to **on**, but
this would be set back to off following review

More extensive docs required. Planning a whole new section to explain
why its OK and what it means etc., plus additional section in Perf Tips.
Code has good comments to explain things.

Logic to signal walwriter exists but is not now ever called. Seemed
worth keeping until review, at least.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



tg.v19.tar.gz
Description: application/compressed-tar

---(end of broadcast)---
TIP 6: explain analyze is your friend