Re: [PATCHES] Transaction Guarantee, updated version
Quick progress report: On Thu, 2007-06-21 at 16:05 -0400, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > Completed all of the agreed changes for TG: All comments changed, plus some refactoring of code. > Can we fix it to be a read test instead of a write test, that is, if > we know WAL has been flushed through the target LSN, it's safe to set > the hint bit, else not? Still working on this, but I'm getting closer. I've got the main shape of it now, but I need to look at it with fresh eyes, so another day on this methinks. I've got an array of LSNs per clog page, as previously described on hackers, rather than just one per page as before. > In general, I think a transaction abort should not need to flush > anything, since the default assumption is that it crashed anyway. > Hence for instance recording a transaction abort needn't advance > the LSN of the clog page. (You seem to have it flushing through > the last xlog record written by the backend, which is exactly what > it doesn't need to do.) By extension, it should be OK to set INVALID > (aborted) hint bits in a tuple header without any concerns about > flushing. Done > The caching logic in TransactionGetCommitLSN is obviously broken. Fixed > Is there really a use-case for adding a pgstat counter for "guaranteed" > transactions? That adds pgstat overhead, and bloats the patch > noticeably, and I don't entirely see the value of it. Removed > There's some padding junk inserted in XLogCtlData, which as far as I > recall was never discussed, and is certainly not an integral part of the > delayed-commit feature. If you want that you should submit and defend > it separately. Done -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Transaction Guarantee, updated version
"Simon Riggs" <[EMAIL PROTECTED]> writes: > On Thu, 2007-06-21 at 16:05 -0400, Tom Lane wrote: >> Can we fix it to be a read test instead of a write test, that is, if >> we know WAL has been flushed through the target LSN, it's safe to set >> the hint bit, else not? > Yes, that's roughly how it worked in v1. > The various possible strategies for handling hint bits were: > 1. flush WAL to current insert pointer - very heavy perf hit > 2. defer setting the hint bits at all, if the written transactions are > recently written. To do this we need a cache of recently written > deferred commit Xids. > 3. flush WAL up to the LSN of the Xid, if it is a deferred commit. To do > this we need a cache of recently written deferred commit Xids. I think you misunderstood me: 4. If WAL has not been flushed far enough to be certain that the target transaction's commit is synced, then defer setting the hint bit. This does not require any new per-transaction state, we can just measure it on the basis of the lsn associated with the clog page that the transaction's state is in. Compared to #2, this gets rid of complexity of bookkeeping at the cost of not being able to set hint bits as soon as we otherwise could. I think both #1 and #3 would lose most of the advantage of deferred commit, as it is not at all uncommon for a transaction's effects to be examined by the following transaction, so you'd be flushing WAL immediately just to set hint bits. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Transaction Guarantee, updated version
On Fri, 2007-06-22 at 10:22 -0400, Tom Lane wrote: > 4. If WAL has not been flushed far enough to be certain that the target > transaction's commit is synced, then defer setting the hint bit. This > does not require any new per-transaction state, we can just measure it > on the basis of the lsn associated with the clog page that the > transaction's state is in. Seems like a great idea, and avoids that annoying DFC cache. I'll do that. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Transaction Guarantee, updated version
On Thu, 2007-06-21 at 16:05 -0400, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > Completed all of the agreed changes for TG: > > I've just realized that there's a fatal problem with this design. > We've now got tqual.c setting page LSN when it holds only share lock > on the buffer. That will absolutely not work, eg two backends might > concurrently set different values and end up with garbage (since it's > unlikely that LSN store is atomic). > > Can we fix it to be a read test instead of a write test, that is, if > we know WAL has been flushed through the target LSN, it's safe to set > the hint bit, else not? Yes, that's roughly how it worked in v1. The various possible strategies for handling hint bits were: 1. flush WAL to current insert pointer - very heavy perf hit 2. defer setting the hint bits at all, if the written transactions are recently written. To do this we need a cache of recently written deferred commit Xids. 3. flush WAL up to the LSN of the Xid, if it is a deferred commit. To do this we need a cache of recently written deferred commit Xids. Any more? v1 implemented (2) on the basis that it was a small timing hole that was best handled by doing as little as possible, though we could easily implement (3) instead. But you need the deferred commit cache either way, AFAICS. So proposal is: Maintain cache of unflushed deferred commit xacts and their corresponding LSNs. When you need to set a commit bit you check the required LSN for the xact from cache, then flush WAL up to that LSN. If xact doesn't exist, then do nothing because it is either an already flushed deferred commit or a normal commit (therefore already flushed too). Don't update the page's LSN at all. More code, but it works correctly and efficiently. This is pretty much putting back most of v1, though. > In general, I think a transaction abort should not need to flush > anything, since the default assumption is that it crashed anyway. It currently does that though and I haven't sought to change that. > Hence for instance recording a transaction abort needn't advance > the LSN of the clog page. (You seem to have it flushing through > the last xlog record written by the backend, which is exactly what > it doesn't need to do.) By extension, it should be OK to set INVALID > (aborted) hint bits in a tuple header without any concerns about > flushing. Sure. > Also, I'm sort of wondering if we really need a separate walwriter > process; that code seems awfully duplicative. Is there a reason > not to have the bgwriter include this functionality? The bgwriter is doing something else already. Fsyncing the WAL takes significant time and that would detract from the main role of bgwriter. > In lesser news: > > The caching logic in TransactionGetCommitLSN is obviously broken. OK, but won't check because of what you've said at top of mail. No need to fix that if the basic premise needs changing anyway. > Is there really a use-case for adding a pgstat counter for "guaranteed" > transactions? That adds pgstat overhead, and bloats the patch > noticeably, and I don't entirely see the value of it. I'm easy on that: its easier to take code out than add it in. It certainly helped to debug things. > There's some padding junk inserted in XLogCtlData, which as far as I > recall was never discussed, and is certainly not an integral part of the > delayed-commit feature. If you want that you should submit and defend > it separately. OK -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Transaction Guarantee, updated version
"Simon Riggs" <[EMAIL PROTECTED]> writes: > Completed all of the agreed changes for TG: I've just realized that there's a fatal problem with this design. We've now got tqual.c setting page LSN when it holds only share lock on the buffer. That will absolutely not work, eg two backends might concurrently set different values and end up with garbage (since it's unlikely that LSN store is atomic). Can we fix it to be a read test instead of a write test, that is, if we know WAL has been flushed through the target LSN, it's safe to set the hint bit, else not? In general, I think a transaction abort should not need to flush anything, since the default assumption is that it crashed anyway. Hence for instance recording a transaction abort needn't advance the LSN of the clog page. (You seem to have it flushing through the last xlog record written by the backend, which is exactly what it doesn't need to do.) By extension, it should be OK to set INVALID (aborted) hint bits in a tuple header without any concerns about flushing. Also, I'm sort of wondering if we really need a separate walwriter process; that code seems awfully duplicative. Is there a reason not to have the bgwriter include this functionality? In lesser news: The caching logic in TransactionGetCommitLSN is obviously broken. Is there really a use-case for adding a pgstat counter for "guaranteed" transactions? That adds pgstat overhead, and bloats the patch noticeably, and I don't entirely see the value of it. There's some padding junk inserted in XLogCtlData, which as far as I recall was never discussed, and is certainly not an integral part of the delayed-commit feature. If you want that you should submit and defend it separately. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match