Re: [HACKERS] Open issues for HOT patch
Gregory Stark wrote: > > "Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > > > There is one wacky idea I haven't dared to propose yet: > > > > We could lift the limitation that you can't defragment a page that's > > pinned, if we play some smoke and mirrors in the buffer manager. When > > you prune a page, make a *copy* of the page you're pruning, and keep > > both versions in the buffer cache. Old pointers keep pointing to the old > > version. Any new calls to ReadBuffer will return the new copy, and the > > old copy can be dropped when its pin count drops to zero. > > Fwiw when Heikki first mentioned this idea I thought it was the craziest thing > I ever heard. But the more I thought about it the more I liked it. I've come > to the conclusion that while it's a wart, it's not much worse than the wart of > the super-exclusive lock which it replaces. In fact it's arguably cleaner in > some ways. I read this as very similar to RCU. In some scenarios it makes a lot of sense. I don't think it's a thing to be attacked in 8.3 though, since it is a big change to the bufmgr. -- Alvaro Herrerahttp://www.advogato.org/person/alvherre "The Postgresql hackers have what I call a "NASA space shot" mentality. Quite refreshing in a world of "weekend drag racer" developers." (Scott Marlowe) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Open issues for HOT patch
On Sep 19, 2007, at 8:08 AM, Tom Lane wrote: Decibel! <[EMAIL PROTECTED]> writes: 3 isn't that important to me, but 4 is: 4. Doesn't hammer the database to measure And pgstattuple fails #4 miserably. Want to know the average dead space in a 500GB database? Yeah, right So we could put a vacuum_cost_delay() in it ... pg_stat_delay()? ;) That's better than what we have now, without a doubt. But I'd still prefer to have a table I can just read on the fly. We do have a TODO to replace the current stats infrastructure with something that has less overhead, right? :) -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Open issues for HOT patch
Decibel! <[EMAIL PROTECTED]> writes: > 3 isn't that important to me, but 4 is: > 4. Doesn't hammer the database to measure > And pgstattuple fails #4 miserably. Want to know the average dead space > in a 500GB database? Yeah, right So we could put a vacuum_cost_delay() in it ... regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Open issues for HOT patch
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes: > There is one wacky idea I haven't dared to propose yet: > > We could lift the limitation that you can't defragment a page that's > pinned, if we play some smoke and mirrors in the buffer manager. When > you prune a page, make a *copy* of the page you're pruning, and keep > both versions in the buffer cache. Old pointers keep pointing to the old > version. Any new calls to ReadBuffer will return the new copy, and the > old copy can be dropped when its pin count drops to zero. Fwiw when Heikki first mentioned this idea I thought it was the craziest thing I ever heard. But the more I thought about it the more I liked it. I've come to the conclusion that while it's a wart, it's not much worse than the wart of the super-exclusive lock which it replaces. In fact it's arguably cleaner in some ways. As a result vacuum would never have to wait for arbitrarily long pins and there wouldn't be the concept of a vacuum waiting for a vacuum lock with strange lock queueing semantics. It also means we could move tuples around on the page more freely. The only places which would have to deal with a possible new buffer would be precisely those places that lock the page. If you aren't locking the page then you definitely aren't about to fiddle with any bits that matter since your writes could be lost. Certainly you're not about to set xmin or xmax or anything like that. You might set hint bits which would be lost but probably not often since you would have already checked the visibility of the tuples with the page locked. There may be one or two places where we fiddle bits for a tuple we've just inserted ourselves thinking nobody else can see it yet, but the current philosophy seems to be leaning towards treating such coding practices as unacceptably fragile anyways. The buffer manager doesn't really need to track multiple "versions" of pages. It would just mark the old version as an orphaned buffer which is automatically a victim for the clock sweep as soon as the pin count drops to 0. It will never need to return such a buffer. What we would need is enough information to reread the buffer if someone tries to lock it and to unpin it when someone unpins a newer version. At first I thought the cost of copying the page would be a downside but in fact Heikki pointed out that in defragmentation we're already copying the page. In fact copying it to new memory instead of memory which is almost certainly likely in processor caches which would need to be invalidated would actually be faster and avoiding the use of memmove could be faster too. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Open issues for HOT patch
Tom Lane wrote: > Heikki Linnakangas <[EMAIL PROTECTED]> writes: >> We could lift the limitation that you can't defragment a page that's >> pinned, if we play some smoke and mirrors in the buffer manager. When >> you prune a page, make a *copy* of the page you're pruning, and keep >> both versions in the buffer cache. Old pointers keep pointing to the old >> version. Any new calls to ReadBuffer will return the new copy, and the >> old copy can be dropped when its pin count drops to zero. > > No, that's way too wacky. How do you prevent people from making further > changes to the "old" version? For instance, marking a tuple deleted? To make any changes to the "old" version, you need to lock the page with LockBuffer. LockBuffer needs to return a buffer with the latest version of the page, and the caller has to use that version for any changes. Changing all callers of LockBuffer (that lock heap pages) to do that is the biggest change involved, AFAICS. Hint bit updates to the old version we could just forget about. > The actual practical application we have, I think, would only require > being able to defrag a page that our own backend has pins on, which is > something that might be more workable --- but it still seems awfully > fragile. It could maybe be made to work in the simplest case of a > plain UPDATE, because in practice I think the executor will never > reference the old tuple's contents after heap_update() returns. But > this falls down in more complex situations involving joins --- we might > continue to try to join the same "old" tuple to other rows, and then any > pass-by-reference Datums we are using are corrupt if the tuple got > moved. Ugh, yeah that's too fragile. Another wacky idea: Within our own backend, we could keep track of which tuples we've accessed, and defrag could move all other tuples as long as the ones that we might still have pointers to are not touched. The bookkeeping wouldn't have to be exact, as long as it's conservative. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Open issues for HOT patch
On 9/19/07, Simon Riggs <[EMAIL PROTECTED]> wrote: > > > In your last post you mentioned multiple UPDATEs. Pruning multiple times > for successive UPDATEs isn't going to release more space, so why do it? > > I guess the point is when you do the first update, there is enough free space in the page and hence we won't prune the page. We run out of free space for subsequent updates and want to prune, but now can't prune because executor is holding a pin on the buffer. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Open issues for HOT patch
Decibel! wrote: > On Tue, Sep 18, 2007 at 11:32:52AM -0400, Tom Lane wrote: >>> Another option would be to prune whenever the free space goes >>> below table fillfactor and hope that users would set fillfactor so that >>> atleast one updated tuple can fit in the block. I know its not best to >>> rely on the users though. But it can be good hint. >> If default fillfactor weren't 100% then this might be good ;-). But > > Erik Jones and I were just talking about FILLFACTOR... > > Is the plan to keep it at 100% with HOT? ISTM that's not such a great > idea, since it forces at least the first update (if not many more) to be > COLD. I think we should still keep it at 100%. Most tables are not updated, and a non-100% fillfactor will be waste of space when the extra space is not needed. Even a table that is updated should reach a steady state after a few cold updates. Those cold updates will make room on the pages for future updates, now that we can prune them and leave only dead line pointers behind. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Open issues for HOT patch
On Tue, 2007-09-18 at 12:10 -0400, Tom Lane wrote: > I wrote: > > * The patch makes undocumented changes that cause autovacuum's decisions > > to be driven by total estimated dead space rather than total number of > > dead tuples. Do we like this? > > No one seems to have picked up on this point, but after reflection > I think there's actually a pretty big problem here. Per-page pruning > is perfectly capable of keeping dead space in check. In a system with > HOT running well, the reasons to vacuum a table will be: > > 1. Remove dead index entries. > 2. Remove LP_DEAD line pointers. > 3. Truncate off no-longer-used end pages. > 4. Transfer knowledge about free space into FSM. > > Pruning cannot accomplish #1, #2, or #3, and without significant changes > in the FSM infrastructure it has no hope about #4 either. What I'm > afraid of is that steady page-level pruning will keep the amount of dead > space low, causing autovacuum never to fire, causing the indexes to > bloat indefinitely because of #1 and the table itself to bloat > indefinitely because of #2 and #4. Thus, the proposed change in > autovacuum seems badly misguided: instead of making autovacuum trigger > on things that only it can fix, it makes autovacuum trigger on something > that per-page pruning can deal with perfectly well. > > I'm inclined to think that we should continue to drive autovac off a > count of dead rows, as this is directly related to points #1 and #2, > and doesn't seem any worse for #3 and #4 than an estimate based on space > would be. Possibly it would be sensible for per-page pruning to report > a reduction in number of dead rows when it removes heap-only tuples, > but I'm not entirely sure --- any thoughts? Some behavioural comments only: I was part of the earlier discussion about when-to-VACUUM and don't have any fixed view of how to do this. If HOT is running well, then there will be less need for #1, #3 and #4, as I understand it. Deletes will still cause the need for #1, #3, #4 as well as dead-space removal. Many tables have only Inserts and Deletes, so we need to take that into account. On large tables, VACUUM hurts very badly, so I would like to see it run significantly less often. In your last post you mentioned multiple UPDATEs. Pruning multiple times for successive UPDATEs isn't going to release more space, so why do it? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Open issues for HOT patch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Decibel! wrote: > On Tue, Sep 18, 2007 at 09:31:03AM -0700, Joshua D. Drake wrote: >>> If we do this, then it's not clear that having pgstats track dead space >>> is worth the trouble at all. It might possibly be of value for testing >>> purposes to see how well pruning is doing, but I'm unconvinced that it's >>> worth bloating stats messages and files to have this number in a >>> production system. An alternative that would serve as well for testing >>> would be to teach contrib/pgstattuple to measure dead space. >> As a DBA, I can say it doesn't really matter to me *how we track* the >> dead space, as long as tracking it is: >> >> 1. Clear >> 2. Simple >> 3. Available by default (thus pgstattuple needs to push into core) > > 3 isn't that important to me, but 4 is: > > 4. Doesn't hammer the database to measure > > And pgstattuple fails #4 miserably. Want to know the average dead space > in a 500GB database? Yeah, right Point taken, and agreed (although I still think it needs to be in core). Joshua D. Drake - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG8JpeATb/zqfZUUQRAprDAJ9PtAUx8ZG5P/HnQSM9KZZ/ii3QzwCdHRZ6 JHwNQMkwpS63huymdN0r4Yc= =0TuC -END PGP SIGNATURE- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Open issues for HOT patch
On Tue, Sep 18, 2007 at 09:31:03AM -0700, Joshua D. Drake wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Tom Lane wrote: > > I wrote: > >> * The patch makes undocumented changes that cause autovacuum's decisions > >> to be driven by total estimated dead space rather than total number of > >> dead tuples. Do we like this? > > > If we do this, then it's not clear that having pgstats track dead space > > is worth the trouble at all. It might possibly be of value for testing > > purposes to see how well pruning is doing, but I'm unconvinced that it's > > worth bloating stats messages and files to have this number in a > > production system. An alternative that would serve as well for testing > > would be to teach contrib/pgstattuple to measure dead space. > > As a DBA, I can say it doesn't really matter to me *how we track* the > dead space, as long as tracking it is: > > 1. Clear > 2. Simple > 3. Available by default (thus pgstattuple needs to push into core) 3 isn't that important to me, but 4 is: 4. Doesn't hammer the database to measure And pgstattuple fails #4 miserably. Want to know the average dead space in a 500GB database? Yeah, right -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 pgpYEy0HNFGbI.pgp Description: PGP signature
Re: [HACKERS] Open issues for HOT patch
On Tue, Sep 18, 2007 at 11:32:52AM -0400, Tom Lane wrote: > > Another option would be to prune whenever the free space goes > > below table fillfactor and hope that users would set fillfactor so that > > atleast one updated tuple can fit in the block. I know its not best to > > rely on the users though. But it can be good hint. > > If default fillfactor weren't 100% then this might be good ;-). But Erik Jones and I were just talking about FILLFACTOR... Is the plan to keep it at 100% with HOT? ISTM that's not such a great idea, since it forces at least the first update (if not many more) to be COLD. I realize that ideally we'd probably want FILLFACTOR to take things like average tuple size and average number of updates per page into account, but for a first pass 90% would likely be a good compromise... -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 pgpB5r6zpzlbd.pgp Description: PGP signature
Re: [HACKERS] Open issues for HOT patch
On 9/18/07, Tom Lane <[EMAIL PROTECTED]> wrote: > > "Pavan Deolasee" <[EMAIL PROTECTED]> writes: > > On 9/18/07, Tom Lane <[EMAIL PROTECTED]> wrote: > >> In a system with > >> HOT running well, the reasons to vacuum a table will be: > >> > >> 1. Remove dead index entries. > >> 2. Remove LP_DEAD line pointers. > >> 3. Truncate off no-longer-used end pages. > >> 4. Transfer knowledge about free space into FSM. > >> > >> Pruning cannot accomplish #1, #2, or #3, and without significant > changes > >> in the FSM infrastructure it has no hope about #4 either. > > > I guess we already have mechanism to remove dead index entries > > outside vacuum. > > Not a trustworthy one --- unless you have a solid proposal for making it > work with bitmap indexscans, it would be foolish to design autovacuum > behavior on the assumption that dead index entries aren't a problem. > > Hmm.. I think we need to drop this for now because I am sure any such proposal would need a lot more discussion. May be something we can pick up for 8.4 So we go back to tracking dead tuples. I would still be inclined towards tracking non-HOT dead tuples or subtract the count of pruned HOT tuples because we don't want to trigger autovacuum too often, rather let pruning clean as much dead space as possible. What it means also that the tuple storage reclaimed by pruning a non-HOT dead tuples does not impact the autovacuum behavior, positively or negatively. And ISTM that this does not address 4 ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Open issues for HOT patch
"Pavan Deolasee" <[EMAIL PROTECTED]> writes: > On 9/18/07, Tom Lane <[EMAIL PROTECTED]> wrote: >> In a system with >> HOT running well, the reasons to vacuum a table will be: >> >> 1. Remove dead index entries. >> 2. Remove LP_DEAD line pointers. >> 3. Truncate off no-longer-used end pages. >> 4. Transfer knowledge about free space into FSM. >> >> Pruning cannot accomplish #1, #2, or #3, and without significant changes >> in the FSM infrastructure it has no hope about #4 either. > I guess we already have mechanism to remove dead index entries > outside vacuum. Not a trustworthy one --- unless you have a solid proposal for making it work with bitmap indexscans, it would be foolish to design autovacuum behavior on the assumption that dead index entries aren't a problem. (Also, IIRC only btree has been taught to recover dead entries at all.) regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Open issues for HOT patch
On 9/18/07, Tom Lane <[EMAIL PROTECTED]> wrote: > > > * I'm still pretty unhappy about the patch's use of a relcache copy of > GetAvgFSMRequestSize()'s result. The fact that there's no provision for > ever updating the value while the relcache entry lives is part of it, > but the bigger part is that I'd rather not have anything at all > depending on that number. We could fix the first part by adding relcache invalidation whenever the average FSM request size changes by a certain margin. But I am not insisting on using the avgFSM mechanism to decide when to prune. Perhaps we could > replace that heuristic with something that is page-local; seems like > dividing the total used space by the number of item pointers would give > at least a rough approximation of the page's average tuple size. > > We might get it completely wrong unless we know the number of normal line pointers (redirected, dead and unused line pointers do not take any real storage). Another option would be to prune whenever the free space goes below table fillfactor and hope that users would set fillfactor so that atleast one updated tuple can fit in the block. I know its not best to rely on the users though. But it can be good hint. Yet another option would be to set a hint on the page whenever we fail to do HOT update because of not-enough-free-space in the block. Next time we shall prune and so the subsequent updates would be HOT update. None of these are perfect though. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Open issues for HOT patch
On 9/18/07, Tom Lane <[EMAIL PROTECTED]> wrote: > > I wrote: > > * The patch makes undocumented changes that cause autovacuum's decisions > > to be driven by total estimated dead space rather than total number of > > dead tuples. Do we like this? > > No one seems to have picked up on this point, but after reflection > I think there's actually a pretty big problem here. Per-page pruning > is perfectly capable of keeping dead space in check. In a system with > HOT running well, the reasons to vacuum a table will be: > > 1. Remove dead index entries. > 2. Remove LP_DEAD line pointers. > 3. Truncate off no-longer-used end pages. > 4. Transfer knowledge about free space into FSM. > > Pruning cannot accomplish #1, #2, or #3, and without significant changes > in the FSM infrastructure it has no hope about #4 either. I guess we already have mechanism to remove dead index entries outside vacuum. So my take would be handle vacuum based on dead index entries separately. May be we can track number of dead index entries and trigger vacuum on the base relation if it goes beyond a threshold. For LP_DEAD line pointers, with some adjustments to the patch, we can make it track dead space in a page by accounting for the the LP_DEAD pointers. So if there is a bloat because of LP_DEAD pointers, that will be reflected in the dead space and help us trigger vacuum on the table. > I'm inclined to think that we should continue to drive autovac off a > count of dead rows If we do that, I guess it would make sense to count only non-HOT dead tuples because HOT tuples neither create LP_DEAD line pointers nor cause index bloats. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Open issues for HOT patch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Lane wrote: > I wrote: >> * The patch makes undocumented changes that cause autovacuum's decisions >> to be driven by total estimated dead space rather than total number of >> dead tuples. Do we like this? > If we do this, then it's not clear that having pgstats track dead space > is worth the trouble at all. It might possibly be of value for testing > purposes to see how well pruning is doing, but I'm unconvinced that it's > worth bloating stats messages and files to have this number in a > production system. An alternative that would serve as well for testing > would be to teach contrib/pgstattuple to measure dead space. As a DBA, I can say it doesn't really matter to me *how we track* the dead space, as long as tracking it is: 1. Clear 2. Simple 3. Available by default (thus pgstattuple needs to push into core) Sincerely, Joshua D. Drake - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG7/1HATb/zqfZUUQRAorXAJ47OZI8n7Bpj4pRyxRO1nGCUC7L0wCgojPZ 74vcXOZ1KqTFKw8v/w4WngI= =Bpc2 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Open issues for HOT patch
I wrote: > * The patch makes undocumented changes that cause autovacuum's decisions > to be driven by total estimated dead space rather than total number of > dead tuples. Do we like this? No one seems to have picked up on this point, but after reflection I think there's actually a pretty big problem here. Per-page pruning is perfectly capable of keeping dead space in check. In a system with HOT running well, the reasons to vacuum a table will be: 1. Remove dead index entries. 2. Remove LP_DEAD line pointers. 3. Truncate off no-longer-used end pages. 4. Transfer knowledge about free space into FSM. Pruning cannot accomplish #1, #2, or #3, and without significant changes in the FSM infrastructure it has no hope about #4 either. What I'm afraid of is that steady page-level pruning will keep the amount of dead space low, causing autovacuum never to fire, causing the indexes to bloat indefinitely because of #1 and the table itself to bloat indefinitely because of #2 and #4. Thus, the proposed change in autovacuum seems badly misguided: instead of making autovacuum trigger on things that only it can fix, it makes autovacuum trigger on something that per-page pruning can deal with perfectly well. I'm inclined to think that we should continue to drive autovac off a count of dead rows, as this is directly related to points #1 and #2, and doesn't seem any worse for #3 and #4 than an estimate based on space would be. Possibly it would be sensible for per-page pruning to report a reduction in number of dead rows when it removes heap-only tuples, but I'm not entirely sure --- any thoughts? If we do this, then it's not clear that having pgstats track dead space is worth the trouble at all. It might possibly be of value for testing purposes to see how well pruning is doing, but I'm unconvinced that it's worth bloating stats messages and files to have this number in a production system. An alternative that would serve as well for testing would be to teach contrib/pgstattuple to measure dead space. Comments? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Open issues for HOT patch
On 9/18/07, Tom Lane <[EMAIL PROTECTED]> wrote: > > > > Another option would be to prune whenever the free space goes > > below table fillfactor > > If default fillfactor weren't 100% then this might be good ;-). But > we could use max(1-fillfactor, BLCKSZ/8) or some such. > > > Yet another option would be to set a hint on the page whenever we > > fail to do HOT update > > This would be a good idea independent of anything else, I think. > > Or may be a combination of the above two would work well. If the user has not specified any fillfactor (and so table is using default 100), the first update is most likely a COLD update and we set the hint bit. The retired tuple is then pruned before the next update comes and we shall do HOT update. OTOH if the use has specified a fillfactor less than to avoid any COLD update, the first update will be HOT. At that point, the used space would go above fillfactor and hence we shall prune before the next update comes making room for next HOT update. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Open issues for HOT patch
"Pavan Deolasee" <[EMAIL PROTECTED]> writes: > On 9/18/07, Tom Lane <[EMAIL PROTECTED]> wrote: >> Perhaps we could >> replace that heuristic with something that is page-local; seems like >> dividing the total used space by the number of item pointers would give >> at least a rough approximation of the page's average tuple size. >> > We might get it completely wrong unless we know the number of > normal line pointers (redirected, dead and unused line pointers > do not take any real storage). Sure, but it's only a heuristic anyway. Probably a more serious objection is that it fails in the wrong direction: if you start to get line pointer bloat then the estimated average tuple size goes down, making it less likely to prune instead of more. But maybe do something that looks at free space and number of pointers independently, rather than operating in terms of average tuple size? > Another option would be to prune whenever the free space goes > below table fillfactor and hope that users would set fillfactor so that > atleast one updated tuple can fit in the block. I know its not best to > rely on the users though. But it can be good hint. If default fillfactor weren't 100% then this might be good ;-). But we could use max(1-fillfactor, BLCKSZ/8) or some such. > Yet another option would be to set a hint on the page whenever we > fail to do HOT update because of not-enough-free-space in the > block. Next time we shall prune and so the subsequent updates > would be HOT update. This would be a good idea independent of anything else, I think. There's plenty of room for page hint bits, and the write will be "free" since we must set the old tuple XMAX anyway. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Open issues for HOT patch
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> But then what happens when you want to update a second tuple on the same > >> page? None of our existing plan types release and reacquire pin if they > >> don't have to, and I really doubt that we want to give up that > >> optimization. > > > You will prune when you lock the page and at that point unless you got > > enough room for both tuples I doubt trying just before the second tuple > > is going to help. > > No, you're missing the point completely. If the free space on the page > is, say, 1.5x the average tuple size, the code *won't* prune, and then > it will be stuck when it goes to do the second tuple update, because > there is no chance to reconsider the prune/no-prune decision after some > space is eaten by the first update. My point is that if you only do this for INSERT/UPDATE, you can prune when you have less than enough room for 3-4 tuples, and if you add the xmin of the earliest prune xact you can prune even more aggressively. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Open issues for HOT patch
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > There is one wacky idea I haven't dared to propose yet: > We could lift the limitation that you can't defragment a page that's > pinned, if we play some smoke and mirrors in the buffer manager. When > you prune a page, make a *copy* of the page you're pruning, and keep > both versions in the buffer cache. Old pointers keep pointing to the old > version. Any new calls to ReadBuffer will return the new copy, and the > old copy can be dropped when its pin count drops to zero. No, that's way too wacky. How do you prevent people from making further changes to the "old" version? For instance, marking a tuple deleted? The actual practical application we have, I think, would only require being able to defrag a page that our own backend has pins on, which is something that might be more workable --- but it still seems awfully fragile. It could maybe be made to work in the simplest case of a plain UPDATE, because in practice I think the executor will never reference the old tuple's contents after heap_update() returns. But this falls down in more complex situations involving joins --- we might continue to try to join the same "old" tuple to other rows, and then any pass-by-reference Datums we are using are corrupt if the tuple got moved. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Open issues for HOT patch
Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> But then what happens when you want to update a second tuple on the same >> page? None of our existing plan types release and reacquire pin if they >> don't have to, and I really doubt that we want to give up that >> optimization. > You will prune when you lock the page and at that point unless you got > enough room for both tuples I doubt trying just before the second tuple > is going to help. No, you're missing the point completely. If the free space on the page is, say, 1.5x the average tuple size, the code *won't* prune, and then it will be stuck when it goes to do the second tuple update, because there is no chance to reconsider the prune/no-prune decision after some space is eaten by the first update. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Open issues for HOT patch
Tom Lane wrote: > * We also need to think harder about when to invoke the page pruning > code. As the patch stands, if you set a breakpoint at > heap_page_prune_opt it'll seem to be hit constantly (eg, once for every > system catalog probe), which seems uselessly often. And yet it also > seems not often enough, because one thing I found out real fast is that > the "prune if free space < 1.2 average tuple size" heuristic fails badly > when you look at queries that execute multiple updates within the same > heap page. We only prune when we first pin a particular target page, > and so the additional updates don't afford another chance to see if it's > time to prune. > > I'd like to see if we can arrange to only do pruning when reading a page > that is known to be an update target (ie, never during plain SELECTs); > I suspect this would be relatively easy with some executor and perhaps > planner changes. But that only fixes the first half of the gripe above; > I'm not at all sure what to do about the multiple-updates-per-page > issue. There is one wacky idea I haven't dared to propose yet: We could lift the limitation that you can't defragment a page that's pinned, if we play some smoke and mirrors in the buffer manager. When you prune a page, make a *copy* of the page you're pruning, and keep both versions in the buffer cache. Old pointers keep pointing to the old version. Any new calls to ReadBuffer will return the new copy, and the old copy can be dropped when its pin count drops to zero. Tracking multiple copies of a page requires some changes to the buffer manager. LockBuffer would need to return the latest version of the page, because anything that checks visibility or does updates would need to use the latest copy, and callers of LockBuffer would need to be changed accordingly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Open issues for HOT patch
Bruce Momjian wrote: > If we only prune on an update (or insert) why not just do prune every > time? I figure the prune/defrag has to be lighter than the > update/insert itself. Pruning is a quite costly operation. You need to check the visibility of each tuple on the page, following tuple chains as you go, mark line pointers as not used or redirected, and finally memmove all the tuples to remove the gaps between them. And it needs to be WAL-logged. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Open issues for HOT patch
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> The problem is you can't prune anymore once you have existing pin on the > >> target page. I'd really like to get around that, but so far it seems > >> unacceptably fragile --- the executor really doesn't expect tuples to > >> get moved around underneath it. > > > I thought you could do the pruning before you pin the page only in > > update/insert cases. > > But then what happens when you want to update a second tuple on the same > page? None of our existing plan types release and reacquire pin if they > don't have to, and I really doubt that we want to give up that > optimization. You will prune when you lock the page and at that point unless you got enough room for both tuples I doubt trying just before the second tuple is going to help. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Open issues for HOT patch
Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> The problem is you can't prune anymore once you have existing pin on the >> target page. I'd really like to get around that, but so far it seems >> unacceptably fragile --- the executor really doesn't expect tuples to >> get moved around underneath it. > I thought you could do the pruning before you pin the page only in > update/insert cases. But then what happens when you want to update a second tuple on the same page? None of our existing plan types release and reacquire pin if they don't have to, and I really doubt that we want to give up that optimization. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Open issues for HOT patch
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > If we only prune on an update (or insert) why not just do prune every > > time? > > The problem is you can't prune anymore once you have existing pin on the > target page. I'd really like to get around that, but so far it seems > unacceptably fragile --- the executor really doesn't expect tuples to > get moved around underneath it. I thought you could do the pruning before you pin the page only in update/insert cases. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Open issues for HOT patch
Bruce Momjian <[EMAIL PROTECTED]> writes: > If we only prune on an update (or insert) why not just do prune every > time? The problem is you can't prune anymore once you have existing pin on the target page. I'd really like to get around that, but so far it seems unacceptably fragile --- the executor really doesn't expect tuples to get moved around underneath it. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Open issues for HOT patch
Tom Lane wrote: > * We also need to think harder about when to invoke the page pruning > code. As the patch stands, if you set a breakpoint at > heap_page_prune_opt it'll seem to be hit constantly (eg, once for every > system catalog probe), which seems uselessly often. And yet it also > seems not often enough, because one thing I found out real fast is that > the "prune if free space < 1.2 average tuple size" heuristic fails badly > when you look at queries that execute multiple updates within the same > heap page. We only prune when we first pin a particular target page, > and so the additional updates don't afford another chance to see if it's > time to prune. > > I'd like to see if we can arrange to only do pruning when reading a page > that is known to be an update target (ie, never during plain SELECTs); > I suspect this would be relatively easy with some executor and perhaps > planner changes. But that only fixes the first half of the gripe above; > I'm not at all sure what to do about the multiple-updates-per-page > issue. If we only prune on an update (or insert) why not just do prune every time? I figure the prune/defrag has to be lighter than the update/insert itself. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend