Re: [HACKERS] Quick idea for reducing VACUUM contention
On Jul 30, 2007, at 1:47 PM, Alvaro Herrera wrote: Jim Nasby wrote: On Jul 27, 2007, at 1:49 AM, Alvaro Herrera wrote: ITAGAKI Takahiro wrote: It would be cool if we could do something like sweep a range of pages, initiate IO for those that are not in shared buffers, and while that is running, lock and clean up the ones that are in shared buffers, skipping those that are not lockable right away; when that's done, go back to those buffers that were gotten from I/O and clean those up. And retry Would that be substantially easier than just creating a bgreader? I'm not sure about easier, but I'm not sure that the bgreader can do the same job. ISTM that the bgreader would be mostly in charge of reading in advance of backends, whereas what I'm proposing is mostly about finding the best spot for locking. It might turn out to be more trouble than it's worth though, for sure. And in any case I'm not in a hurry to implement it. I was referring specifically to the "read in what's not already in shared buffers" part of Itagaki-san's message... that seems to be something best suited for a bgreader. -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Quick idea for reducing VACUUM contention
Decibel! wrote: > On Jul 30, 2007, at 8:00 PM, Alvaro Herrera wrote: >> ITAGAKI Takahiro wrote: >>> Alvaro Herrera <[EMAIL PROTECTED]> wrote: > I think we might need additional "freezing-xmax" operations to avoid > XID-wraparound in the first path of vacuum, though it hardly occurs. I'm not sure I follow. Can you elaborate? Do you mean storing a separate relfrozenxmax for each table or something like that? >>> >>> We need to work around wraparound of xmax in dead tuples. If we miss to >>> vacuum them and XID is wrapped, we cannot remove them until the next >>> XID-wraparound, because we treat them to be deleted in the *future*. >> >> Oh, but this should not be a problem, because a tuple is either frozen >> or removed completely -- xmax cannot precede xmin. > > What if it's frozen, then deleted, and then we wrap on xmax? Wouldn't that > make the tuple re-appear? That cannot happen, because the next vacuum will remove the tuple if the Xmax is committed. If the deleting transaction aborts, then vacuum will set Xmax to Invalid (see heap_freeze_tuple in heapam.c). One potential problem you would see is if the deleting transaction marks it deleted and then not commit for 2 billion transactions, thus vacuum is not able to remove it because it shows up as delete-in-progress. However there are plenty other problems you would hit in that case (autovacuum starting to misbehave being the first you would probably notice). -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Quick idea for reducing VACUUM contention
On Jul 30, 2007, at 8:00 PM, Alvaro Herrera wrote: ITAGAKI Takahiro wrote: Alvaro Herrera <[EMAIL PROTECTED]> wrote: I think we might need additional "freezing-xmax" operations to avoid XID-wraparound in the first path of vacuum, though it hardly occurs. I'm not sure I follow. Can you elaborate? Do you mean storing a separate relfrozenxmax for each table or something like that? We need to work around wraparound of xmax in dead tuples. If we miss to vacuum them and XID is wrapped, we cannot remove them until the next XID-wraparound, because we treat them to be deleted in the *future*. Oh, but this should not be a problem, because a tuple is either frozen or removed completely -- xmax cannot precede xmin. What if it's frozen, then deleted, and then we wrap on xmax? Wouldn't that make the tuple re-appear? -- Decibel!, aka Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Quick idea for reducing VACUUM contention
ITAGAKI Takahiro wrote: > > Alvaro Herrera <[EMAIL PROTECTED]> wrote: > > > > I think we might need additional "freezing-xmax" operations to avoid > > > XID-wraparound in the first path of vacuum, though it hardly occurs. > > > > I'm not sure I follow. Can you elaborate? Do you mean storing a > > separate relfrozenxmax for each table or something like that? > > We need to work around wraparound of xmax in dead tuples. If we miss to > vacuum them and XID is wrapped, we cannot remove them until the next > XID-wraparound, because we treat them to be deleted in the *future*. Oh, but this should not be a problem, because a tuple is either frozen or removed completely -- xmax cannot precede xmin. > > > We just add XID of the vacuum to dead tuples we see in the > > > first path. When backends find a dead tuple and see the transaction > > > identified by XID in it has commited, they can freely reuse the area of > > > the dead tuple because we can assume index entries pointing the tuple > > > have been removed by the vacuum. > > > > I would be worried about leftover index entries being later used by new > > tuples in the heap. Then when you visit the index, find that entry, go > > to the heap and find the new tuple and return it, which could be bogus. > > Avoiding dangling index entries, I'm thinking about reusing dead tuples > only if we see the VACUUM transaction have committed successfully. > That means the VACUUM transaction removed all index entries corresponding > those dead tuples; They are now Heap-Only-Tuples, so that we can recycle > them in the same manner as HOT updated tuples. Hmm. OK, I admit I have no idea how HOT works. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Quick idea for reducing VACUUM contention
Alvaro Herrera <[EMAIL PROTECTED]> wrote: > > I think we might need additional "freezing-xmax" operations to avoid > > XID-wraparound in the first path of vacuum, though it hardly occurs. > > I'm not sure I follow. Can you elaborate? Do you mean storing a > separate relfrozenxmax for each table or something like that? We need to work around wraparound of xmax in dead tuples. If we miss to vacuum them and XID is wrapped, we cannot remove them until the next XID-wraparound, because we treat them to be deleted in the *future*. > > We just add XID of the vacuum to dead tuples we see in the > > first path. When backends find a dead tuple and see the transaction > > identified by XID in it has commited, they can freely reuse the area of > > the dead tuple because we can assume index entries pointing the tuple > > have been removed by the vacuum. > > I would be worried about leftover index entries being later used by new > tuples in the heap. Then when you visit the index, find that entry, go > to the heap and find the new tuple and return it, which could be bogus. Avoiding dangling index entries, I'm thinking about reusing dead tuples only if we see the VACUUM transaction have committed successfully. That means the VACUUM transaction removed all index entries corresponding those dead tuples; They are now Heap-Only-Tuples, so that we can recycle them in the same manner as HOT updated tuples. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Quick idea for reducing VACUUM contention
Jim Nasby wrote: > On Jul 27, 2007, at 1:49 AM, Alvaro Herrera wrote: >> ITAGAKI Takahiro wrote: >>> "Simon Riggs" <[EMAIL PROTECTED]> wrote: >>> Read the heap blocks in sequence, but make a conditional lock for cleanup on each block. If we don't get it, sleep, then try again when we wake up. If we fail the second time, just skip the block completely. >> >> It would be cool if we could do something like sweep a range of pages, >> initiate IO for those that are not in shared buffers, and while that is >> running, lock and clean up the ones that are in shared buffers, skipping >> those that are not lockable right away; when that's done, go back to >> those buffers that were gotten from I/O and clean those up. And retry >> the locking for those that couldn't be locked the first time around, >> also conditionally. And when that's all done, a third pass could get >> those blocks that weren't cleaned up in none of the previous passes (and >> this time the lock would not be conditional). > > Would that be substantially easier than just creating a bgreader? I'm not sure about easier, but I'm not sure that the bgreader can do the same job. ISTM that the bgreader would be mostly in charge of reading in advance of backends, whereas what I'm proposing is mostly about finding the best spot for locking. It might turn out to be more trouble than it's worth though, for sure. And in any case I'm not in a hurry to implement it. In any case I'm not so sure about skipping vacuuming a block if it's not lockable. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Crear es tan difĂcil como ser libre" (Elsa Triolet) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Quick idea for reducing VACUUM contention
On Jul 27, 2007, at 1:49 AM, Alvaro Herrera wrote: ITAGAKI Takahiro wrote: "Simon Riggs" <[EMAIL PROTECTED]> wrote: Read the heap blocks in sequence, but make a conditional lock for cleanup on each block. If we don't get it, sleep, then try again when we wake up. If we fail the second time, just skip the block completely. It would be cool if we could do something like sweep a range of pages, initiate IO for those that are not in shared buffers, and while that is running, lock and clean up the ones that are in shared buffers, skipping those that are not lockable right away; when that's done, go back to those buffers that were gotten from I/O and clean those up. And retry the locking for those that couldn't be locked the first time around, also conditionally. And when that's all done, a third pass could get those blocks that weren't cleaned up in none of the previous passes (and this time the lock would not be conditional). Would that be substantially easier than just creating a bgreader? -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Quick idea for reducing VACUUM contention
ITAGAKI Takahiro wrote: > "Simon Riggs" <[EMAIL PROTECTED]> wrote: > > > Read the heap blocks in sequence, but make a conditional lock for > > cleanup on each block. If we don't get it, sleep, then try again when we > > wake up. If we fail the second time, just skip the block completely. It would be cool if we could do something like sweep a range of pages, initiate IO for those that are not in shared buffers, and while that is running, lock and clean up the ones that are in shared buffers, skipping those that are not lockable right away; when that's done, go back to those buffers that were gotten from I/O and clean those up. And retry the locking for those that couldn't be locked the first time around, also conditionally. And when that's all done, a third pass could get those blocks that weren't cleaned up in none of the previous passes (and this time the lock would not be conditional). Then do a vacuum_delay sleep. > When we allow some skips in removing dead tuples, can we guarantee > pg_class.relfrozenxid? No we can't. > I think we might need additional "freezing-xmax" operations to avoid > XID-wraparound in the first path of vacuum, though it hardly occurs. I'm not sure I follow. Can you elaborate? Do you mean storing a separate relfrozenxmax for each table or something like that? > It might be a future topic ... if we are in the direciton of > "optimistic sweeping", is it possible to remove the second path of vacuum > completely? We just add XID of the vacuum to dead tuples we see in the > first path. When backends find a dead tuple and see the transaction > identified by XID in it has commited, they can freely reuse the area of > the dead tuple because we can assume index entries pointing the tuple > have been removed by the vacuum. I would be worried about leftover index entries being later used by new tuples in the heap. Then when you visit the index, find that entry, go to the heap and find the new tuple and return it, which could be bogus. (Unless, I think, you check in the index when you are going to insert the new index tuple -- if the CTID is already used, reuse that entry or remove it before insertion). I don't know. Maybe it's OK but it seems messy even if it is. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Quick idea for reducing VACUUM contention
"Simon Riggs" <[EMAIL PROTECTED]> wrote: > Read the heap blocks in sequence, but make a conditional lock for > cleanup on each block. If we don't get it, sleep, then try again when we > wake up. If we fail the second time, just skip the block completely. When we allow some skips in removing dead tuples, can we guarantee pg_class.relfrozenxid? I think we might need additional "freezing-xmax" operations to avoid XID-wraparound in the first path of vacuum, though it hardly occurs. It might be a future topic ... if we are in the direciton of "optimistic sweeping", is it possible to remove the second path of vacuum completely? We just add XID of the vacuum to dead tuples we see in the first path. When backends find a dead tuple and see the transaction identified by XID in it has commited, they can freely reuse the area of the dead tuple because we can assume index entries pointing the tuple have been removed by the vacuum. We would use the infrastructure introduced by HOT for this purpose. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Quick idea for reducing VACUUM contention
Just wanted to record a quick idea in case its useful in the future. VACUUM reads all blocks in sequence and waits on each one to acquire a cleanup lock. If VACUUM is running with vacuum_delay enabled then we might take a slightly different approach: Read the heap blocks in sequence, but make a conditional lock for cleanup on each block. If we don't get it, sleep, then try again when we wake up. If we fail the second time, just skip the block completely. As long as we skip no more than 1% of the blocks we should be able to do a very good job of cleanup, yet with reduced block contention as the VACUUM proceeds. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly