Re: [HACKERS] HOT is applied
On 9/21/07, Merlin Moncure [EMAIL PROTECTED] wrote: Well, my first round of results are so far not showing the big gains I saw with hot in some of the earlier patches...so far, it looks approximately to be a wash although with the reduced need to vacuum. let me correct myself here. I did some longer runs and came up with the following results (these runs are still not really long enough, I need to run some more tests). The smaller, cpu bound cases were blowout victories for hot. The largest run (i/o bound) was very close but there were not enough transactions to force a vacuum, after which hot would probably pull away by some degree. one small aside: I am suspicious that 8.3 improvements to the stats collector overhead are going to reap big benefits. This merits some extra investigation, i think. hardware: 2xintel [EMAIL PROTECTED], 4 cores total 8gb ram 5x15krpm sas, raid 0 data xfs 5x15krpm sas, raid 0, wal xfs fsync on, asynch commit, partial page writes, autovac on not hot build cvs dated 9/14 hot build cvs dated 9/21 merlin (results follow) * without hot: * scaling factor: 1 number of clients: 8 number of transactions per client: 25 number of transactions actually processed: 200/200 tps = 2345.225742 (including connections establishing) tps = 2345.264846 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 8 number of transactions per client: 25 number of transactions actually processed: 200/200 tps = 3038.119776 (including connections establishing) tps = 3038.185492 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 25 number of clients: 8 number of transactions per client: 25 number of transactions actually processed: 200/200 tps = 3671.987348 (including connections establishing) tps = 3672.083077 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 100 number of clients: 8 number of transactions per client: 25 number of transactions actually processed: 200/200 tps = 4240.424756 (including connections establishing) tps = 4240.542851 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 1000 number of clients: 8 number of transactions per client: 10 number of transactions actually processed: 80/80 tps = 889.890173 (including connections establishing) tps = 889.905128 (excluding connections establishing) * witht hot: * transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 8 number of transactions per client: 25 number of transactions actually processed: 200/200 tps = 3186.553423 (including connections establishing) tps = 3186.622178 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 10 number of clients: 8 number of transactions per client: 25 number of transactions actually processed: 200/200 tps = 5123.153371 (including connections establishing) tps = 5123.331343 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 25 number of clients: 8 number of transactions per client: 25 number of transactions actually processed: 200/200 tps = 4994.897937 (including connections establishing) tps = 4995.075480 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 100 number of clients: 8 number of transactions per client: 25 number of transactions actually processed: 200/200 tps = 4675.778153 (including connections establishing) tps = 4675.936096 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 1000 number of clients: 8 number of transactions per client: 10 number of transactions actually processed: 80/80 tps = 893.904762 (including connections establishing) tps = 893.919032 (excluding connections establishing) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HOT is applied
On Friday 21 September 2007 13:02, Tom Lane wrote: I wrote: Dunno about more general, but your idea reduces the runtime of this example by about 50% (22.2s to 10.5s) for me. I'm worried though that it would be a net negative in more typical situations, especially if you've got a lot of open subtransactions. Actually ... the only way that TransactionIdIsCurrentTransactionId can take a meaningful amount of time is if you've got lots of subtransactions, and in that case your own subxids cache has certainly overflowed, which is likely to force TransactionIdIsInProgress into the slow answer path. But if we use TransactionIdIsCurrentTransactionId to handle our own xids then we can just ignore MyProc altogether inside the loop, thus very possibly (if we are the only overflowed-subxids proc) saving us from going through the slow answer path. So on reflection I can't see how this isn't a win. I'll clean it up and apply it. Just curious, but does this apply to subtransactions that are the result of plpgsql try...exception blocks? And how many is lots? Presumably looping 50k times in a single procedure would be enough? -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] HOT is applied
Robert Treat [EMAIL PROTECTED] writes: Just curious, but does this apply to subtransactions that are the result of plpgsql try...exception blocks? Only if they changed the database; else they won't have XIDs. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] HOT is applied
Heikki Linnakangas wrote: Tom Lane wrote: I've committed the HOT patch. Thanks, much easier to work with it now that it's in. I'd still like to think about whether we can be smarter about when to invoke pruning, but that's a small enough issue that the patch can go in without it. Yeah. I'm doing some micro-benchmarking, and the attached test case is much slower with HOT. It's spending a lot of time trying to prune, only to find out that it can't. Instead of/in addition to avoiding pruning when it doesn't help, maybe we could make HeapTupleSatisfiesVacuum cheaper. I'm going to continue testing, this is just a heads-up that HOT as committed seriously hurts performance in some cases. (though one can argue that this test case isn't a very realistic one.) This might be a simplistic question but if the page is +90% full and there is a long-lived transaction, isn't Postgres going to try pruning on each page read access? -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] HOT is applied
On 9/21/07, Bruce Momjian [EMAIL PROTECTED] wrote: This might be a simplistic question but if the page is +90% full and there is a long-lived transaction, isn't Postgres going to try pruning on each page read access? The way it stands today, yes. Thats one reason why we are seeing the performance drop in this particular case. I liked the earlier proposed idea of storing the xid in the page header to quickly tell us whether its worth pruning the page. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] HOT is applied
Bruce Momjian wrote: This might be a simplistic question but if the page is +90% full and there is a long-lived transaction, isn't Postgres going to try pruning on each page read access? Yes :(. That's why we earlier talked about stored the xid of the oldest deleted tuple on the page in the page header. That way we could skip the fruitless pruning attempts until that xid OldestXmin. Another approach is to try to make HeapTupleSatisfiesVacuum cheaper, so that the fruitless pruning attempts wouldn't hurt that much. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] HOT is applied
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: I'd still like to think about whether we can be smarter about when to invoke pruning, but that's a small enough issue that the patch can go in without it. Yeah. I'm doing some micro-benchmarking, and the attached test case is much slower with HOT. It's spending a lot of time trying to prune, only to find out that it can't. Not sure if that's an appropriate description or not. oprofile (on a dual Xeon running Fedora 6) shows me this: ... samples %symbol name 1070003 29.8708 LWLockAcquire 1015097 28.3380 LWLockRelease 2835147.9147 heap_page_prune ... so basically it's all about the locking. Maybe the problem is that with HOT we lock the buffer too often? heap_page_prune_opt is designed to not take the buffer lock unless there's a good probability of needing to prune, but maybe that's not working as intended. If you look at the callgraph, you'll see that those LWLockAcquire/Release calls are coming from HeapTupleSatisfiesVacuum - TransactionIdIsInProgress, which keeps trashing the ProcArrayLock. A if(TransactionIdIsCurrentTransactionId(xid)) return true; check in TransactionIdIsInProgress would speed that up, but I wonder if there's a more general solution to make HeapTupleSatisfiesVacuum cheaper. For example, we could cache the in-progress status of tuples. Shouldn't we be able to prune rows that have been inserted and deleted by the same transaction? I'd have hoped to see this example use only one heap page ... Well maybe, but that's a separate issue. Wouldn't we need the snapshot bookkeeping we've discussed in the past, to notice that there's no snapshot in our own backend that needs to see the tuples? Nevertheless, the fruitless pruning attempts would still hurt us in slightly more complex scenarios, even if we fixed the above. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] HOT is applied
Merlin Moncure wrote: On 9/20/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Yeah. I'm doing some micro-benchmarking, and the attached test case is much slower with HOT. It's spending a lot of time trying to prune, only to find out that it can't. Instead of/in addition to avoiding pruning when it doesn't help, maybe we could make HeapTupleSatisfiesVacuum cheaper. I'm going to continue testing, this is just a heads-up that HOT as committed seriously hurts performance in some cases. (though one can argue that this test case isn't a very realistic one.) well, I ran your test on my box and here are the results: pre hot: run 1: 3617.641 ms run 2: 5195.215 ms run 3: 6760.449 ms after vacuum: run 1: 4171.362 ms run 2: 5513.317 ms run 3: 6884.125 ms post hot: run 1: Time: 7286.292 ms run 2: Time: 7477.089 ms run 3: Time: 7701.229 ms those results aren't exactly terrible, and this case is highly artificial. Your runtimes seem to be increasing as you repeat the test. Did you remove the DROP TABLE from the beginning? On my laptop, post hot takes ~2x as long as pre hot, even when repeated, which matches the results of your first runs. Yeah, it's highly artificial. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] HOT is applied
Merlin Moncure wrote: On 9/20/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Yeah. I'm doing some micro-benchmarking, and the attached test case is much slower with HOT. It's spending a lot of time trying to prune, only to find out that it can't. Instead of/in addition to avoiding pruning when it doesn't help, maybe we could make HeapTupleSatisfiesVacuum cheaper. I'm going to continue testing, this is just a heads-up that HOT as committed seriously hurts performance in some cases. (though one can argue that this test case isn't a very realistic one.) well, I ran your test on my box and here are the results: pre hot: run 1: 3617.641 ms run 2: 5195.215 ms run 3: 6760.449 ms after vacuum: run 1: 4171.362 ms run 2: 5513.317 ms run 3: 6884.125 ms post hot: run 1: Time: 7286.292 ms run 2: Time: 7477.089 ms run 3: Time: 7701.229 ms those results aren't exactly terrible, Your runtimes seem to be increasing as you repeat the test. Did you remove the DROP TABLE from the beginning? On my laptop, post hot takes ~2x as long as pre hot, even when repeated, which matches the results of your first runs. and this case is highly artificial. Yeah. I'm going through a bunch of CPU test cases a colleague of mine wrote. Some of the test cases shows much worse performance with HOT. Unfortunately they're quite complex and overlapping, so instead of posting them as is, I'm going through the ones that HOT performs badly at, and try to reduce them to simpler test cases like the above. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] HOT is applied
On 9/21/07, Tom Lane [EMAIL PROTECTED] wrote: so this example is getting past the heuristic tests in heap_page_prune_opt almost every time. Why is that? Too tired to poke at it more tonight. I guess you already know the answer now, but anyways: Since we are updating a single tuple in a single transaction, each update is preceded by a sequential scan. All but last pages are completely full and marked prunable, so HOT would try to (unsuccessfully) prune every (except may be last) page. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] HOT is applied
On 9/21/07, Tom Lane [EMAIL PROTECTED] wrote: It shouldn't, though --- the hint bit should get cleared on the first try. I think I probably broke something in the last round of revisions to heap_page_prune_opt, but haven't looked yet ... We set the hint bit (prunable) again when we see a RECENTLY_DEAD or DELETE_IN_PROGRESS tuple. This is correct in normal circumstances because otherwise we would never be able to prune the page. On a second thought, may be we can choose not to set the bit again. In the worst case, the next update on the page would be a COLD update. That would set the bit again and we shall retry prune in the next visit to the page. This scheme would work as long as we don't put in mechanism to update FSM after pruning. But if we choose to update FSM (I am inclined to do this to avoid repeated relation extension because of COLD updates), we would need to improve this. Otherwise a page full of only DEAD tuples may never be pruned and vacuum would be required to reclaim that space. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] HOT is applied
Heikki Linnakangas [EMAIL PROTECTED] writes: Bruce Momjian wrote: This might be a simplistic question but if the page is +90% full and there is a long-lived transaction, isn't Postgres going to try pruning on each page read access? Yes :( It shouldn't, though --- the hint bit should get cleared on the first try. I think I probably broke something in the last round of revisions to heap_page_prune_opt, but haven't looked yet ... regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HOT is applied
Tom Lane wrote: I don't much like the idea of adding an xid to the page header --- for one thing, *which* xid would you put there, and what would you test it against? I was thinking that you would put the smallest in-progress xmax on the page there, and you would test it against OldestXmin. If all transactions commit, there surely isn't anything to prune until that xid falls beyond the OldestXmin horizon. If an inserting transaction aborts, we could prune the aborted tuple earlier, but optimizing for aborts doesn't seem that important. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] HOT is applied
Heikki Linnakangas [EMAIL PROTECTED] writes: If you look at the callgraph, you'll see that those LWLockAcquire/Release calls are coming from HeapTupleSatisfiesVacuum - TransactionIdIsInProgress, which keeps trashing the ProcArrayLock. A if(TransactionIdIsCurrentTransactionId(xid)) return true; check in TransactionIdIsInProgress would speed that up, but I wonder if there's a more general solution to make HeapTupleSatisfiesVacuum cheaper. For example, we could cache the in-progress status of tuples. Dunno about more general, but your idea reduces the runtime of this example by about 50% (22.2s to 10.5s) for me. I'm worried though that it would be a net negative in more typical situations, especially if you've got a lot of open subtransactions. regards, tom lane *** src/backend/storage/ipc/procarray.c.origSat Sep 8 16:31:15 2007 --- src/backend/storage/ipc/procarray.c Fri Sep 21 11:08:34 2007 *** *** 341,346 --- 341,353 return false; } + /* +* Also, we can detect our own transaction without any access to shared +* memory. +*/ + if (TransactionIdIsCurrentTransactionId(xid)) + return true; + /* Get workspace to remember main XIDs in */ xids = (TransactionId *) palloc(sizeof(TransactionId) * arrayP-maxProcs); ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] HOT is applied
Pavan Deolasee [EMAIL PROTECTED] writes: On 9/21/07, Tom Lane [EMAIL PROTECTED] wrote: so this example is getting past the heuristic tests in heap_page_prune_opt almost every time. Why is that? Too tired to poke at it more tonight. I guess you already know the answer now, but anyways: Since we are updating a single tuple in a single transaction, each update is preceded by a sequential scan. All but last pages are completely full and marked prunable, so HOT would try to (unsuccessfully) prune every (except may be last) page. Hmm ... the problem really is that heap_page_prune turns the hint back on when it sees a DELETE_IN_PROGRESS tuple. Maybe that's a bad idea. I don't much like the idea of adding an xid to the page header --- for one thing, *which* xid would you put there, and what would you test it against? regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HOT is applied
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Bruce Momjian wrote: This might be a simplistic question but if the page is +90% full and there is a long-lived transaction, isn't Postgres going to try pruning on each page read access? Yes :( It shouldn't, though --- the hint bit should get cleared on the first try. Think so? That would mean that you only get a single chance to prune after an update. Not sure how big a problem that is, but I do feel that would make HOT a lot less effective in some usage patterns. The idea of having a prunable xmin in the page header sounds more and more attractive to me... I think I probably broke something in the last round of revisions to heap_page_prune_opt, but haven't looked yet ... Pavan's patch was like that as well; you don't clear the flag (or rather, you set it again while pruning) until there's nothing left to prune on the page. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] HOT is applied
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: If you look at the callgraph, you'll see that those LWLockAcquire/Release calls are coming from HeapTupleSatisfiesVacuum - TransactionIdIsInProgress, which keeps trashing the ProcArrayLock. A if(TransactionIdIsCurrentTransactionId(xid)) return true; check in TransactionIdIsInProgress would speed that up, but I wonder if there's a more general solution to make HeapTupleSatisfiesVacuum cheaper. For example, we could cache the in-progress status of tuples. Dunno about more general, but your idea reduces the runtime of this example by about 50% (22.2s to 10.5s) for me. I'm worried though that it would be a net negative in more typical situations, especially if you've got a lot of open subtransactions. Yeah. I played with this a bit more, and came up with a couple of other micro-optimizations: 1. Instead of pallocing and pfreeing a new array in TransactionIdIsInProgress, we could just malloc the array once and reuse it. That palloc/pfree alone was consuming around 8% of CPU time in the test case. 2. About ~30% of CPU time is spent in HeapTupleSatisfiesMVCC - TransactionIdDidAbort. This is in fact not related to HOT, but it caught my eye while looking at the profile. We have this piece of code in HeapTupleSatisfiesMVCC: /* deleting subtransaction aborted? */ /* FIXME -- is this correct w.r.t. the cmax of the tuple? */ if (TransactionIdDidAbort(HeapTupleHeaderGetXmax(tuple))) { SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId); return true; } Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tuple))); We've already checked that the xmin is our own transaction id, so we check if the xmax is an aborted subtransaction of our own transaction. A TransactionIdDidAbort call seems like an awfully expensive way to check that. We could call TransactionIdIsCurrentTransactionId instead, which doesn't need to access any shared memory structures (but might be expensive if you have a lot of active subxacts, as you pointed out). Or we could keep a list of aborted subtransactions in backend private memory, and just check against that. In this particular case, a simple if(xmin==xmax) check would be effective as well, since we know that the xmin is not aborted at that point. 3. One more general alternative to the little patch I sent earlier would be to build an array of in-progress-xids in TransactionIdInProgress, like we do in GetSnapshotData, and use that for the in-progress checks in HeapTupleSatisfiesVacuum. That should work, if we build the cached array when we lock the page for pruning, and use it until we unlock. If a transaction commits/abort after we build the cached array, we might return DELETE/INSERT_IN_PROGRESS for a tuple that is in fact already DEAD, but that's ok. Since we're holding a lock on the page, it's not possible for new transaction to start and subsequently put it's xid on the page where we would bump into it. I believe this would have roughly the same performance effect in this test case as calling TransactionIdIsCurrentTransactionId from TransactionIdIsInProgress. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] HOT is applied
I wrote: Dunno about more general, but your idea reduces the runtime of this example by about 50% (22.2s to 10.5s) for me. I'm worried though that it would be a net negative in more typical situations, especially if you've got a lot of open subtransactions. Actually ... the only way that TransactionIdIsCurrentTransactionId can take a meaningful amount of time is if you've got lots of subtransactions, and in that case your own subxids cache has certainly overflowed, which is likely to force TransactionIdIsInProgress into the slow answer path. But if we use TransactionIdIsCurrentTransactionId to handle our own xids then we can just ignore MyProc altogether inside the loop, thus very possibly (if we are the only overflowed-subxids proc) saving us from going through the slow answer path. So on reflection I can't see how this isn't a win. I'll clean it up and apply it. I'm also starting to come around to liking the page-header-xid field a bit more. I suggest that it could replace the page is prunable flag bit altogether --- to mark the page prunable, you must store some appropriate xid into the header field. This would avoid a useless prune attempt immediately after a page is marked prunable. A possible problem with it, if we treat it as a non-WAL-logged hint, is that corruption of the value could lead to a page being marked with an xid that's way in the future, keeping it from getting pruned. However, if the page header gets corrupted this is the least of your worries, so it doesn't seem like an enormous objection. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] HOT is applied
On 9/21/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Merlin Moncure wrote: pre hot: run 1: 3617.641 ms run 2: 5195.215 ms run 3: 6760.449 ms after vacuum: run 1: 4171.362 ms run 2: 5513.317 ms run 3: 6884.125 ms post hot: run 1: Time: 7286.292 ms run 2: Time: 7477.089 ms run 3: Time: 7701.229 ms those results aren't exactly terrible, and this case is highly artificial. Your runtimes seem to be increasing as you repeat the test. Did you remove the DROP TABLE from the beginning? On my laptop, post hot takes ~2x as long as pre hot, even when repeated, which matches the results of your first runs. correct. Well, my first round of results are so far not showing the big gains I saw with hot in some of the earlier patches...so far, it looks approximately to be a wash although with the reduced need to vacuum. i'll test some more when things settle down. merlin ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HOT is applied
Merlin Moncure wrote: Well, my first round of results are so far not showing the big gains I saw with hot in some of the earlier patches...so far, it looks approximately to be a wash although with the reduced need to vacuum. i'll test some more when things settle down. Oh... Which version did you test earlier? What are you testing, pgbench? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] HOT is applied
Tom Lane wrote: Actually ... the only way that TransactionIdIsCurrentTransactionId can take a meaningful amount of time is if you've got lots of subtransactions, and in that case your own subxids cache has certainly overflowed, which is likely to force TransactionIdIsInProgress into the slow answer path. But if we use TransactionIdIsCurrentTransactionId to handle our own xids then we can just ignore MyProc altogether inside the loop, thus very possibly (if we are the only overflowed-subxids proc) saving us from going through the slow answer path. So on reflection I can't see how this isn't a win. I'll clean it up and apply it. Good point. I noted that most callers of TransactionIdIsInProgress in tqual.c already call TransactionIdIsCurrentTransactionId before TransactionIdIsInProgress. In those cases we could just skip the test for our own xids altogether, if it's worth code mangling to tell TransactionIdIsInProgress whether it's safe to skip it or not. I'm also starting to come around to liking the page-header-xid field a bit more. I suggest that it could replace the page is prunable flag bit altogether --- to mark the page prunable, you must store some appropriate xid into the header field. This would avoid a useless prune attempt immediately after a page is marked prunable. Yeah, we don't need the flag if we have an xid. A possible problem with it, if we treat it as a non-WAL-logged hint, is that corruption of the value could lead to a page being marked with an xid that's way in the future, keeping it from getting pruned. However, if the page header gets corrupted this is the least of your worries, so it doesn't seem like an enormous objection. Agreed. Next attempt to prune or vacuum the page will fix it anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] HOT is applied
Heikki Linnakangas [EMAIL PROTECTED] writes: Yeah. I played with this a bit more, and came up with a couple of other micro-optimizations: 1. Instead of pallocing and pfreeing a new array in TransactionIdIsInProgress, we could just malloc the array once and reuse it. That palloc/pfree alone was consuming around 8% of CPU time in the test case. Good idea --- not only faster, but we can get rid of all the goto's and the locked flag, if we're willing to have a couple more LWLockRelease calls here. I'll incorporate this in the patch I'm working up. We've already checked that the xmin is our own transaction id, so we check if the xmax is an aborted subtransaction of our own transaction. A TransactionIdDidAbort call seems like an awfully expensive way to check that. We could call TransactionIdIsCurrentTransactionId instead, which doesn't need to access any shared memory structures (but might be expensive if you have a lot of active subxacts, as you pointed out). I like that idea too ... 3. One more general alternative to the little patch I sent earlier would be to build an array of in-progress-xids in TransactionIdInProgress, like we do in GetSnapshotData, and use that for the in-progress checks in HeapTupleSatisfiesVacuum. That should work, if we build the cached array when we lock the page for pruning, and use it until we unlock. This seems like it uglifies a whole lot of APIs, though, for what's probably not going to be a lot of gain. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] HOT is applied
On 9/21/07, Tom Lane [EMAIL PROTECTED] wrote: I'm also starting to come around to liking the page-header-xid field a bit more. I suggest that it could replace the page is prunable flag bit altogether --- to mark the page prunable, you must store some appropriate xid into the header field. This would avoid a useless prune attempt immediately after a page is marked prunable. Agree. I was thinking of minimum of the following: 1 xmin of INSERT_IN_PROGRESS tuple 2 xmax of DELETE_IN_PROGRESS tuple 3 xmax of RECENTLY_DEAD_TUPLE When we attempt a prune, we can set xid to the minimum of the above three. If none of the above tuples exist in the page, xid can be set to FrozenXid A page need not be pruned if its xid is set to FrozenXid or is greater than OldtestXmin. In addition, I guess we can even set the xid also when we insert, delete or update from the page unless its already set to a lower value. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] HOT is applied
Heikki Linnakangas [EMAIL PROTECTED] writes: We've already checked that the xmin is our own transaction id, so we check if the xmax is an aborted subtransaction of our own transaction. A TransactionIdDidAbort call seems like an awfully expensive way to check that. We could call TransactionIdIsCurrentTransactionId instead, which doesn't need to access any shared memory structures (but might be expensive if you have a lot of active subxacts, as you pointed out). Applied --- seems to buy another 50% savings in the example with all the pruning activity. I'll get after the pruning activity itself in a bit, but it seems worth squeezing the tqual functions first while we have this example that stresses them so much. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] HOT is applied
Heikki Linnakangas [EMAIL PROTECTED] writes: I noted that most callers of TransactionIdIsInProgress in tqual.c already call TransactionIdIsCurrentTransactionId before TransactionIdIsInProgress. In those cases we could just skip the test for our own xids altogether, if it's worth code mangling to tell TransactionIdIsInProgress whether it's safe to skip it or not. At least in this example, it seems it wouldn't buy anything. HeapTupleSatisfiesVacuum doesn't call TransactionIdIsCurrentTransactionId, and while HeapTupleSatisfiesMVCC does, it falls out at that point and doesn't get to TransactionIdIsInProgress. So I'm not inclined to contort the API unless we can come up with an example where it's important. I'm off to work on the page-header-xid idea ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] HOT is applied
Tom Lane wrote: I've committed the HOT patch. Thanks, much easier to work with it now that it's in. I'd still like to think about whether we can be smarter about when to invoke pruning, but that's a small enough issue that the patch can go in without it. Yeah. I'm doing some micro-benchmarking, and the attached test case is much slower with HOT. It's spending a lot of time trying to prune, only to find out that it can't. Instead of/in addition to avoiding pruning when it doesn't help, maybe we could make HeapTupleSatisfiesVacuum cheaper. I'm going to continue testing, this is just a heads-up that HOT as committed seriously hurts performance in some cases. (though one can argue that this test case isn't a very realistic one.) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com DROP TABLE IF EXISTS testtable; CREATE TABLE testtable (key integer); -- Note that there's no indexes, so updates have to do a seq scan. CREATE OR REPLACE FUNCTION testfunc(data int, key1 int) RETURNS int AS $$ DECLARE BEGIN FOR cnt IN 1..1 LOOP UPDATE testtable SET key = data WHERE key = key1; END LOOP; RETURN 1; END; $$ LANGUAGE plpgsql; INSERT INTO testtable VALUES (1); BEGIN; SELECT testfunc(1,1); COMMIT; ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] HOT is applied
On 9/20/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Tom Lane wrote: I've committed the HOT patch. Thanks, much easier to work with it now that it's in. I'd still like to think about whether we can be smarter about when to invoke pruning, but that's a small enough issue that the patch can go in without it. Yeah. I'm doing some micro-benchmarking, and the attached test case is much slower with HOT. It's spending a lot of time trying to prune, only to find out that it can't. Instead of/in addition to avoiding pruning when it doesn't help, maybe we could make HeapTupleSatisfiesVacuum cheaper. I'm going to continue testing, this is just a heads-up that HOT as committed seriously hurts performance in some cases. (though one can argue that this test case isn't a very realistic one.) well, I ran your test on my box and here are the results: pre hot: run 1: 3617.641 ms run 2: 5195.215 ms run 3: 6760.449 ms after vacuum: run 1: 4171.362 ms run 2: 5513.317 ms run 3: 6884.125 ms post hot: run 1: Time: 7286.292 ms run 2: Time: 7477.089 ms run 3: Time: 7701.229 ms those results aren't exactly terrible, and this case is highly artificial. merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] HOT is applied
Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: I'd still like to think about whether we can be smarter about when to invoke pruning, but that's a small enough issue that the patch can go in without it. Yeah. I'm doing some micro-benchmarking, and the attached test case is much slower with HOT. It's spending a lot of time trying to prune, only to find out that it can't. Not sure if that's an appropriate description or not. oprofile (on a dual Xeon running Fedora 6) shows me this: CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 10 samples %symbol name 1070003 29.8708 LWLockAcquire 1015097 28.3380 LWLockRelease 2835147.9147 heap_page_prune 2522707.0425 AllocSetAlloc 1489814.1590 HeapTupleSatisfiesMVCC 1464424.0882 TransactionIdIsInProgress 92673 2.5871 AllocSetFree 84966 2.3720 HeapTupleSatisfiesVacuum 83097 2.3198 TransactionIdGetStatus 80737 2.2539 SimpleLruReadPage_ReadOnly 52803 1.4741 TransactionLogFetch 43406 1.2117 heapgetpage 42536 1.1875 HeapTupleHeaderGetCmin 34842 0.9727 TransactionIdIsCurrentTransactionId 27761 0.7750 TransactionIdDidAbort 24833 0.6933 MemoryContextAlloc 16446 0.4591 TransactionIdPrecedes 16089 0.4491 HeapTupleHeaderGetCmax 12919 0.3607 hash_search_with_hash_value 11857 0.3310 pfree 4964 0.1386 heap_page_prune_opt 3683 0.1028 hash_any 3215 0.0898 LWLockConditionalAcquire 3086 0.0862 PinBuffer 2573 0.0718 UnpinBuffer 2009 0.0561 ConditionalLockBufferForCleanup 1854 0.0518 ReadBuffer_common 934 0.0261 XLogInsert 784 0.0219 heapgettup_pagemode so basically it's all about the locking. Maybe the problem is that with HOT we lock the buffer too often? heap_page_prune_opt is designed to not take the buffer lock unless there's a good probability of needing to prune, but maybe that's not working as intended. [ pokes at it a bit more... ] Actually the disturbing part is that pruning doesn't seem to be working at all: after the test finishes, I see regression=# vacuum verbose testtable; INFO: vacuuming public.testtable INFO: testtable: removed 1 row versions in 44 pages INFO: testtable: found 1 removable, 1 nonremovable row versions in 45 pages DETAIL: 0 dead row versions cannot be removed yet. There were 9955 unused item pointers. 45 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Shouldn't we be able to prune rows that have been inserted and deleted by the same transaction? I'd have hoped to see this example use only one heap page ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] HOT is applied
I wrote: ... so basically it's all about the locking. Maybe the problem is that with HOT we lock the buffer too often? heap_page_prune_opt is designed to not take the buffer lock unless there's a good probability of needing to prune, but maybe that's not working as intended. Indeed it seems it's not; gprof shows 0.000.00 111/1389276 index_getnext cycle 7 [89] 0.05 49.52 1389165/1389276 heapgetpage [6] [8] 50.90.05 49.52 1389276 heap_page_prune_opt [8] 7.17 42.31 1366696/1366696 heap_page_prune [9] 0.010.03 1366696/1366696 ConditionalLockBufferForCleanup [106] 0.010.00 278/2780795 PageGetHeapFreeSpace [177] so this example is getting past the heuristic tests in heap_page_prune_opt almost every time. Why is that? Too tired to poke at it more tonight. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] HOT is applied
On 9/21/07, Tom Lane [EMAIL PROTECTED] wrote: Shouldn't we be able to prune rows that have been inserted and deleted by the same transaction? I'd have hoped to see this example use only one heap page ... Not before the transaction commits ? In the test, we update a single tuple 1 times in the same transaction. So there is no opportunity to prune. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] HOT is applied
Pavan Deolasee [EMAIL PROTECTED] writes: On 9/21/07, Tom Lane [EMAIL PROTECTED] wrote: Shouldn't we be able to prune rows that have been inserted and deleted by the same transaction? I'd have hoped to see this example use only one heap page ... Not before the transaction commits ? In the test, we update a single tuple 1 times in the same transaction. So there is no opportunity to prune. [ looks a bit more ... ] Hm, the test I was thinking of was this one at the end of HeapTupleSatisfiesVacuum: if (TransactionIdEquals(HeapTupleHeaderGetXmin(tuple), HeapTupleHeaderGetXmax(tuple))) { /* * Inserter also deleted it, so it was never visible to anyone else. * However, we can only remove it early if it's not an updated tuple; * else its parent tuple is linking to it via t_ctid, and this tuple * mustn't go away before the parent does. */ if (!(tuple-t_infomask HEAP_UPDATED)) return HEAPTUPLE_DEAD; } but control never gets that far because neither xmin nor xmax is committed yet. We could fix that, probably, by considering the xmin=xmax case in the xmin-in-progress case further up; but the HEAP_UPDATED exclusion is still a problem. Still, it seems like this is leaving some money on the table when you think about pruning a HOT chain. Can we improve on it easily? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] HOT is applied
On 9/21/07, Tom Lane [EMAIL PROTECTED] wrote: but control never gets that far because neither xmin nor xmax is committed yet. We could fix that, probably, by considering the xmin=xmax case in the xmin-in-progress case further up; but the HEAP_UPDATED exclusion is still a problem. Still, it seems like this is leaving some money on the table when you think about pruning a HOT chain. Can we improve on it easily? May be we can, but it would get a bit tricky. There might be a transaction looking at the first tuple in the chain and waiting for this (inserting-deleting) transaction to finish. If the waiting transaction is running in READ COMMITTED mode, it would then follow the update chain. Removing any intermediate tuples without fixing the previous tuple's xmax/ctid (or redirected line pointer) would be tricky. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com