Re: [PATCHES] Make CLUSTER MVCC-safe
Heikki Linnakangas <[EMAIL PROTECTED]> writes: >> This patch makes CLUSTER MVCC-safe. Visibility information and update >> chains are preserved like in VACUUM FULL. > Here's an update, fixing conflict by Tom's recent commit of Simon's > patch to skip WAL-inserts when archiving is not enabled. Applied with revisions. There were some bugs in it: you need to check both xmin and tid when determining if one tuple chains to another, and you can't separate MarkBufferDirty from the critical section that writes xlog. (I got around that by not keeping the working page in buffers at all, the same way btree index build works; should be a bit faster as well as more correct.) It had some memory leakage too. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Make CLUSTER MVCC-safe
Heikki Linnakangas wrote: > Alvaro Herrera wrote: > > Heikki Linnakangas wrote: > >> Here's an update, fixing conflict by Tom's recent commit of Simon's > >> patch to skip WAL-inserts when archiving is not enabled. > > > > Hmm, wouldn't it be better if the rewriteheap.c file was in > > access/heap/ instead of commands/? > > Yeah, maybe. I thought of it as a subsystem of cluster, and possibly > other commands like alter table. But it really is pretty low-level. So, does the patch need adjustment? Should we move it before application? -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Make CLUSTER MVCC-safe
Alvaro Herrera wrote: Heikki Linnakangas wrote: Here's an update, fixing conflict by Tom's recent commit of Simon's patch to skip WAL-inserts when archiving is not enabled. Hmm, wouldn't it be better if the rewriteheap.c file was in access/heap/ instead of commands/? Yeah, maybe. I thought of it as a subsystem of cluster, and possibly other commands like alter table. But it really is pretty low-level. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Make CLUSTER MVCC-safe
Heikki Linnakangas wrote: > Here's an update, fixing conflict by Tom's recent commit of Simon's > patch to skip WAL-inserts when archiving is not enabled. Hmm, wouldn't it be better if the rewriteheap.c file was in access/heap/ instead of commands/? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Make CLUSTER MVCC-safe
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Heikki Linnakangas wrote: > Here's an update, fixing conflict by Tom's recent commit of Simon's > patch to skip WAL-inserts when archiving is not enabled. > > Heikki Linnakangas wrote: > > This patch makes CLUSTER MVCC-safe. Visibility information and update > > chains are preserved like in VACUUM FULL. > > > > I created a new generic rewriteheap-facility to handle rewriting tables > > in a visibility-preserving manner. All the update chain tracking is done > > in rewriteheap.c, the caller is responsible for supplying the stream of > > tuples. > > > > CLUSTER is currently the only user of the facility, but I'm envisioning > > we might have other users in the future. For example, a version of > > VACUUM FULL that rewrites the whole table. We could also use it to make > > ALTER TABLE MVCC-safe, but there's some issues with that. For example, > > what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint. > > > > One complication in the implementation was the fact that heap_insert > > overwrites the visibility information, and it doesn't write the full > > tuple header to WAL. I ended up implementing a special-purpose > > raw_heap_insert function instead, which is optimized for bulk inserting > > a lot of tuples, knowing that we have exclusive access to the heap. > > raw_heap_insert keeps the current buffer locked over calls, until it > > gets full, and inserts the whole page to WAL as a single record using > > the existing XLOG_HEAP_NEWPAGE record type. > > > > This makes CLUSTER a more viable alternative to VACUUM FULL. One > > motivation for making CLUSTER MVCC-safe is that if some poor soul runs > > pg_dump to make a backup concurrently with CLUSTER, the clustered tables > > will appear to be empty in the dump file. > > > > The documentation doesn't anything about CLUSTER not being MVCC-safe, so > > I suppose there's no need to touch the docs. I sent a doc patch earlier > > to add a note about it, that doc patch should still be applied to older > > release branches, IMO. > > -- >Heikki Linnakangas >EnterpriseDB http://www.enterprisedb.com > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Make CLUSTER MVCC-safe
Here's an update, fixing conflict by Tom's recent commit of Simon's patch to skip WAL-inserts when archiving is not enabled. Heikki Linnakangas wrote: This patch makes CLUSTER MVCC-safe. Visibility information and update chains are preserved like in VACUUM FULL. I created a new generic rewriteheap-facility to handle rewriting tables in a visibility-preserving manner. All the update chain tracking is done in rewriteheap.c, the caller is responsible for supplying the stream of tuples. CLUSTER is currently the only user of the facility, but I'm envisioning we might have other users in the future. For example, a version of VACUUM FULL that rewrites the whole table. We could also use it to make ALTER TABLE MVCC-safe, but there's some issues with that. For example, what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint. One complication in the implementation was the fact that heap_insert overwrites the visibility information, and it doesn't write the full tuple header to WAL. I ended up implementing a special-purpose raw_heap_insert function instead, which is optimized for bulk inserting a lot of tuples, knowing that we have exclusive access to the heap. raw_heap_insert keeps the current buffer locked over calls, until it gets full, and inserts the whole page to WAL as a single record using the existing XLOG_HEAP_NEWPAGE record type. This makes CLUSTER a more viable alternative to VACUUM FULL. One motivation for making CLUSTER MVCC-safe is that if some poor soul runs pg_dump to make a backup concurrently with CLUSTER, the clustered tables will appear to be empty in the dump file. The documentation doesn't anything about CLUSTER not being MVCC-safe, so I suppose there's no need to touch the docs. I sent a doc patch earlier to add a note about it, that doc patch should still be applied to older release branches, IMO. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/heap/heapam.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v retrieving revision 1.230 diff -c -r1.230 heapam.c *** src/backend/access/heap/heapam.c 29 Mar 2007 00:15:37 - 1.230 --- src/backend/access/heap/heapam.c 29 Mar 2007 08:27:21 - *** *** 3280,3285 --- 3280,3322 return log_heap_update(reln, oldbuf, from, newbuf, newtup, true); } + /* + * Perofrm XLogInsert of a full page image to WAL. Caller must make sure + * the page contents on disk are consistent with the page inserted to WAL. + */ + XLogRecPtr + log_newpage(RelFileNode *rnode, BlockNumber blkno, Page page) + { + xl_heap_newpage xlrec; + XLogRecPtr recptr; + XLogRecData rdata[2]; + + /* NO ELOG(ERROR) from here till newpage op is logged */ + START_CRIT_SECTION(); + + xlrec.node = *rnode; + xlrec.blkno = blkno; + + rdata[0].data = (char *) &xlrec; + rdata[0].len = SizeOfHeapNewpage; + rdata[0].buffer = InvalidBuffer; + rdata[0].next = &(rdata[1]); + + rdata[1].data = (char *) page; + rdata[1].len = BLCKSZ; + rdata[1].buffer = InvalidBuffer; + rdata[1].next = NULL; + + recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata); + + PageSetLSN(page, recptr); + PageSetTLI(page, ThisTimeLineID); + + END_CRIT_SECTION(); + + return recptr; + } + static void heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record) { Index: src/backend/access/nbtree/nbtsort.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtsort.c,v retrieving revision 1.110 diff -c -r1.110 nbtsort.c *** src/backend/access/nbtree/nbtsort.c 9 Jan 2007 02:14:10 - 1.110 --- src/backend/access/nbtree/nbtsort.c 20 Mar 2007 17:58:20 - *** *** 64,69 --- 64,70 #include "postgres.h" + #include "access/heapam.h" #include "access/nbtree.h" #include "miscadmin.h" #include "storage/smgr.h" *** *** 265,296 if (wstate->btws_use_wal) { /* We use the heap NEWPAGE record type for this */ ! xl_heap_newpage xlrec; ! XLogRecPtr recptr; ! XLogRecData rdata[2]; ! ! /* NO ELOG(ERROR) from here till newpage op is logged */ ! START_CRIT_SECTION(); ! ! xlrec.node = wstate->index->rd_node; ! xlrec.blkno = blkno; ! ! rdata[0].data = (char *) &xlrec; ! rdata[0].len = SizeOfHeapNewpage; ! rdata[0].buffer = InvalidBuffer; ! rdata[0].next = &(rdata[1]); ! ! rdata[1].data = (char *) page; ! rdata[1].len = BLCKSZ; ! rdata[1].buffer = InvalidBuffer; ! rdata[1].next = NULL; ! ! recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata); ! ! PageSetLSN(page, recptr); ! PageSetTLI(page, ThisTimeLineID); ! ! END_CRIT_SECTION(); } else { --- 266,272 if (wstate->btws_use_wal) { /* We use the heap NEWPAGE record type for this */ ! log_newpage(&wstate->index->rd_node, blkno, pa
Re: [PATCHES] Make CLUSTER MVCC-safe
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Heikki Linnakangas wrote: > This patch makes CLUSTER MVCC-safe. Visibility information and update > chains are preserved like in VACUUM FULL. > > I created a new generic rewriteheap-facility to handle rewriting tables > in a visibility-preserving manner. All the update chain tracking is done > in rewriteheap.c, the caller is responsible for supplying the stream of > tuples. > > CLUSTER is currently the only user of the facility, but I'm envisioning > we might have other users in the future. For example, a version of > VACUUM FULL that rewrites the whole table. We could also use it to make > ALTER TABLE MVCC-safe, but there's some issues with that. For example, > what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint. > > One complication in the implementation was the fact that heap_insert > overwrites the visibility information, and it doesn't write the full > tuple header to WAL. I ended up implementing a special-purpose > raw_heap_insert function instead, which is optimized for bulk inserting > a lot of tuples, knowing that we have exclusive access to the heap. > raw_heap_insert keeps the current buffer locked over calls, until it > gets full, and inserts the whole page to WAL as a single record using > the existing XLOG_HEAP_NEWPAGE record type. > > This makes CLUSTER a more viable alternative to VACUUM FULL. One > motivation for making CLUSTER MVCC-safe is that if some poor soul runs > pg_dump to make a backup concurrently with CLUSTER, the clustered tables > will appear to be empty in the dump file. > > The documentation doesn't anything about CLUSTER not being MVCC-safe, so > I suppose there's no need to touch the docs. I sent a doc patch earlier > to add a note about it, that doc patch should still be applied to older > release branches, IMO. > > -- >Heikki Linnakangas >EnterpriseDB http://www.enterprisedb.com > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Make CLUSTER MVCC-safe
On Tue, 2007-03-20 at 18:20 +, Heikki Linnakangas wrote: > This patch makes CLUSTER MVCC-safe. Visibility information and update > chains are preserved like in VACUUM FULL. Sounds good. > CLUSTER is currently the only user of the facility, but I'm envisioning > we might have other users in the future. For example, a version of > VACUUM FULL that rewrites the whole table. I would be very much in favour of that, as discussed on -hackers. There might be some requirement for the older VACUUM FULL behaviour, so I'd like to suggest the syntax: VACUUM FULL tablename [REPLACE | PRESERVE [STORAGE]]; where VACUUM FULL foo REPLACE STORAGE would be the new default, using your new functions, and PRESERVE STORAGE would implement the old method. > One complication in the implementation was the fact that heap_insert > overwrites the visibility information, and it doesn't write the full > tuple header to WAL. I ended up implementing a special-purpose > raw_heap_insert function instead, which is optimized for bulk inserting > a lot of tuples, knowing that we have exclusive access to the heap. > raw_heap_insert keeps the current buffer locked over calls, until it > gets full, and inserts the whole page to WAL as a single record using > the existing XLOG_HEAP_NEWPAGE record type. I submitted Fast CLUSTER patch earlier which avoided writing WAL in the same way that has been done for COPY, CREATE INDEX and CTAS. Would you like to update your patch to do this also, or would you like me to re-write the patch to fit with yours? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PATCHES] Make CLUSTER MVCC-safe
This patch makes CLUSTER MVCC-safe. Visibility information and update chains are preserved like in VACUUM FULL. I created a new generic rewriteheap-facility to handle rewriting tables in a visibility-preserving manner. All the update chain tracking is done in rewriteheap.c, the caller is responsible for supplying the stream of tuples. CLUSTER is currently the only user of the facility, but I'm envisioning we might have other users in the future. For example, a version of VACUUM FULL that rewrites the whole table. We could also use it to make ALTER TABLE MVCC-safe, but there's some issues with that. For example, what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint. One complication in the implementation was the fact that heap_insert overwrites the visibility information, and it doesn't write the full tuple header to WAL. I ended up implementing a special-purpose raw_heap_insert function instead, which is optimized for bulk inserting a lot of tuples, knowing that we have exclusive access to the heap. raw_heap_insert keeps the current buffer locked over calls, until it gets full, and inserts the whole page to WAL as a single record using the existing XLOG_HEAP_NEWPAGE record type. This makes CLUSTER a more viable alternative to VACUUM FULL. One motivation for making CLUSTER MVCC-safe is that if some poor soul runs pg_dump to make a backup concurrently with CLUSTER, the clustered tables will appear to be empty in the dump file. The documentation doesn't anything about CLUSTER not being MVCC-safe, so I suppose there's no need to touch the docs. I sent a doc patch earlier to add a note about it, that doc patch should still be applied to older release branches, IMO. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/heap/heapam.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v retrieving revision 1.228 diff -c -r1.228 heapam.c *** src/backend/access/heap/heapam.c 9 Feb 2007 03:35:33 - 1.228 --- src/backend/access/heap/heapam.c 20 Mar 2007 17:55:16 - *** *** 3284,3289 --- 3284,3326 return log_heap_update(reln, oldbuf, from, newbuf, newtup, true); } + /* + * Perofrm XLogInsert of a full page image to WAL. Caller must make sure + * the page contents on disk are consistent with the page inserted to WAL. + */ + XLogRecPtr + log_newpage(RelFileNode *rnode, BlockNumber blkno, Page page) + { + xl_heap_newpage xlrec; + XLogRecPtr recptr; + XLogRecData rdata[2]; + + /* NO ELOG(ERROR) from here till newpage op is logged */ + START_CRIT_SECTION(); + + xlrec.node = *rnode; + xlrec.blkno = blkno; + + rdata[0].data = (char *) &xlrec; + rdata[0].len = SizeOfHeapNewpage; + rdata[0].buffer = InvalidBuffer; + rdata[0].next = &(rdata[1]); + + rdata[1].data = (char *) page; + rdata[1].len = BLCKSZ; + rdata[1].buffer = InvalidBuffer; + rdata[1].next = NULL; + + recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata); + + PageSetLSN(page, recptr); + PageSetTLI(page, ThisTimeLineID); + + END_CRIT_SECTION(); + + return recptr; + } + static void heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record) { Index: src/backend/access/nbtree/nbtsort.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtsort.c,v retrieving revision 1.110 diff -c -r1.110 nbtsort.c *** src/backend/access/nbtree/nbtsort.c 9 Jan 2007 02:14:10 - 1.110 --- src/backend/access/nbtree/nbtsort.c 20 Mar 2007 17:58:20 - *** *** 64,69 --- 64,70 #include "postgres.h" + #include "access/heapam.h" #include "access/nbtree.h" #include "miscadmin.h" #include "storage/smgr.h" *** *** 265,296 if (wstate->btws_use_wal) { /* We use the heap NEWPAGE record type for this */ ! xl_heap_newpage xlrec; ! XLogRecPtr recptr; ! XLogRecData rdata[2]; ! ! /* NO ELOG(ERROR) from here till newpage op is logged */ ! START_CRIT_SECTION(); ! ! xlrec.node = wstate->index->rd_node; ! xlrec.blkno = blkno; ! ! rdata[0].data = (char *) &xlrec; ! rdata[0].len = SizeOfHeapNewpage; ! rdata[0].buffer = InvalidBuffer; ! rdata[0].next = &(rdata[1]); ! ! rdata[1].data = (char *) page; ! rdata[1].len = BLCKSZ; ! rdata[1].buffer = InvalidBuffer; ! rdata[1].next = NULL; ! ! recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata); ! ! PageSetLSN(page, recptr); ! PageSetTLI(page, ThisTimeLineID); ! ! END_CRIT_SECTION(); } else { --- 266,272 if (wstate->btws_use_wal) { /* We use the heap NEWPAGE record type for this */ ! log_newpage(&wstate->index->rd_node, blkno, page); } else { Index: src/backend/commands/Makefile === RCS file: /home/hlinnaka/pg