Re: [PATCHES] Make CLUSTER MVCC-safe

2007-04-07 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 This patch makes CLUSTER MVCC-safe. Visibility information and update 
 chains are preserved like in VACUUM FULL.

 Here's an update, fixing conflict by Tom's recent commit of Simon's 
 patch to skip WAL-inserts when archiving is not enabled.

Applied with revisions.  There were some bugs in it: you need to check
both xmin and tid when determining if one tuple chains to another,
and you can't separate MarkBufferDirty from the critical section that
writes xlog.  (I got around that by not keeping the working page in
buffers at all, the same way btree index build works; should be a bit
faster as well as more correct.)  It had some memory leakage too.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-04-02 Thread Bruce Momjian
Heikki Linnakangas wrote:
 Alvaro Herrera wrote:
  Heikki Linnakangas wrote:
  Here's an update, fixing conflict by Tom's recent commit of Simon's 
  patch to skip WAL-inserts when archiving is not enabled.
  
  Hmm, wouldn't it be better if the rewriteheap.c file was in
  access/heap/ instead of commands/?
 
 Yeah, maybe. I thought of it as a subsystem of cluster, and possibly 
 other commands like alter table. But it really is pretty low-level.

So, does the patch need adjustment?  Should we move it before
application?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-30 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Heikki Linnakangas wrote:
Here's an update, fixing conflict by Tom's recent commit of Simon's 
patch to skip WAL-inserts when archiving is not enabled.


Hmm, wouldn't it be better if the rewriteheap.c file was in
access/heap/ instead of commands/?


Yeah, maybe. I thought of it as a subsystem of cluster, and possibly 
other commands like alter table. But it really is pretty low-level.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-29 Thread Heikki Linnakangas
Here's an update, fixing conflict by Tom's recent commit of Simon's 
patch to skip WAL-inserts when archiving is not enabled.


Heikki Linnakangas wrote:
This patch makes CLUSTER MVCC-safe. Visibility information and update 
chains are preserved like in VACUUM FULL.


I created a new generic rewriteheap-facility to handle rewriting tables 
in a visibility-preserving manner. All the update chain tracking is done 
in rewriteheap.c, the caller is responsible for supplying the stream of 
tuples.


CLUSTER is currently the only user of the facility, but I'm envisioning 
we might have other users in the future. For example, a version of 
VACUUM FULL that rewrites the whole table. We could also use it to make 
ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.


One complication in the implementation was the fact that heap_insert 
overwrites the visibility information, and it doesn't write the full 
tuple header to WAL. I ended up implementing a special-purpose 
raw_heap_insert function instead, which is optimized for bulk inserting 
a lot of tuples, knowing that we have exclusive access to the heap. 
raw_heap_insert keeps the current buffer locked over calls, until it 
gets full, and inserts the whole page to WAL as a single record using 
the existing XLOG_HEAP_NEWPAGE record type.


This makes CLUSTER a more viable alternative to VACUUM FULL. One 
motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
will appear to be empty in the dump file.


The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
I suppose there's no need to touch the docs. I sent a doc patch earlier 
to add a note about it, that doc patch should still be applied to older 
release branches, IMO.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/heap/heapam.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.230
diff -c -r1.230 heapam.c
*** src/backend/access/heap/heapam.c	29 Mar 2007 00:15:37 -	1.230
--- src/backend/access/heap/heapam.c	29 Mar 2007 08:27:21 -
***
*** 3280,3285 
--- 3280,3322 
  	return log_heap_update(reln, oldbuf, from, newbuf, newtup, true);
  }
  
+ /*
+  * Perofrm XLogInsert of a full page image to WAL. Caller must make sure
+  * the page contents on disk are consistent with the page inserted to WAL.
+  */
+ XLogRecPtr
+ log_newpage(RelFileNode *rnode, BlockNumber blkno, Page page)
+ {
+ 	xl_heap_newpage xlrec;
+ 	XLogRecPtr	recptr;
+ 	XLogRecData rdata[2];
+ 
+ 	/* NO ELOG(ERROR) from here till newpage op is logged */
+ 	START_CRIT_SECTION();
+ 
+ 	xlrec.node = *rnode;
+ 	xlrec.blkno = blkno;
+ 
+ 	rdata[0].data = (char *) xlrec;
+ 	rdata[0].len = SizeOfHeapNewpage;
+ 	rdata[0].buffer = InvalidBuffer;
+ 	rdata[0].next = (rdata[1]);
+ 
+ 	rdata[1].data = (char *) page;
+ 	rdata[1].len = BLCKSZ;
+ 	rdata[1].buffer = InvalidBuffer;
+ 	rdata[1].next = NULL;
+ 
+ 	recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
+ 
+ 	PageSetLSN(page, recptr);
+ 	PageSetTLI(page, ThisTimeLineID);
+ 
+ 	END_CRIT_SECTION();
+ 
+ 	return recptr;
+ }
+ 
  static void
  heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record)
  {
Index: src/backend/access/nbtree/nbtsort.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtsort.c,v
retrieving revision 1.110
diff -c -r1.110 nbtsort.c
*** src/backend/access/nbtree/nbtsort.c	9 Jan 2007 02:14:10 -	1.110
--- src/backend/access/nbtree/nbtsort.c	20 Mar 2007 17:58:20 -
***
*** 64,69 
--- 64,70 
  
  #include postgres.h
  
+ #include access/heapam.h
  #include access/nbtree.h
  #include miscadmin.h
  #include storage/smgr.h
***
*** 265,296 
  	if (wstate-btws_use_wal)
  	{
  		/* We use the heap NEWPAGE record type for this */
! 		xl_heap_newpage xlrec;
! 		XLogRecPtr	recptr;
! 		XLogRecData rdata[2];
! 
! 		/* NO ELOG(ERROR) from here till newpage op is logged */
! 		START_CRIT_SECTION();
! 
! 		xlrec.node = wstate-index-rd_node;
! 		xlrec.blkno = blkno;
! 
! 		rdata[0].data = (char *) xlrec;
! 		rdata[0].len = SizeOfHeapNewpage;
! 		rdata[0].buffer = InvalidBuffer;
! 		rdata[0].next = (rdata[1]);
! 
! 		rdata[1].data = (char *) page;
! 		rdata[1].len = BLCKSZ;
! 		rdata[1].buffer = InvalidBuffer;
! 		rdata[1].next = NULL;
! 
! 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
! 
! 		PageSetLSN(page, recptr);
! 		PageSetTLI(page, ThisTimeLineID);
! 
! 		END_CRIT_SECTION();
  	}
  	else
  	{
--- 266,272 
  	if (wstate-btws_use_wal)
  	{
  		/* We use the heap NEWPAGE record type for this */
! 		log_newpage(wstate-index-rd_node, blkno, page);
  	}
  	else
  	

Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-29 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Heikki Linnakangas wrote:
 Here's an update, fixing conflict by Tom's recent commit of Simon's 
 patch to skip WAL-inserts when archiving is not enabled.
 
 Heikki Linnakangas wrote:
  This patch makes CLUSTER MVCC-safe. Visibility information and update 
  chains are preserved like in VACUUM FULL.
  
  I created a new generic rewriteheap-facility to handle rewriting tables 
  in a visibility-preserving manner. All the update chain tracking is done 
  in rewriteheap.c, the caller is responsible for supplying the stream of 
  tuples.
  
  CLUSTER is currently the only user of the facility, but I'm envisioning 
  we might have other users in the future. For example, a version of 
  VACUUM FULL that rewrites the whole table. We could also use it to make 
  ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
  what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.
  
  One complication in the implementation was the fact that heap_insert 
  overwrites the visibility information, and it doesn't write the full 
  tuple header to WAL. I ended up implementing a special-purpose 
  raw_heap_insert function instead, which is optimized for bulk inserting 
  a lot of tuples, knowing that we have exclusive access to the heap. 
  raw_heap_insert keeps the current buffer locked over calls, until it 
  gets full, and inserts the whole page to WAL as a single record using 
  the existing XLOG_HEAP_NEWPAGE record type.
  
  This makes CLUSTER a more viable alternative to VACUUM FULL. One 
  motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
  pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
  will appear to be empty in the dump file.
  
  The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
  I suppose there's no need to touch the docs. I sent a doc patch earlier 
  to add a note about it, that doc patch should still be applied to older 
  release branches, IMO.
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com


 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-29 Thread Alvaro Herrera
Heikki Linnakangas wrote:
 Here's an update, fixing conflict by Tom's recent commit of Simon's 
 patch to skip WAL-inserts when archiving is not enabled.

Hmm, wouldn't it be better if the rewriteheap.c file was in
access/heap/ instead of commands/?



-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-21 Thread Simon Riggs
On Tue, 2007-03-20 at 18:20 +, Heikki Linnakangas wrote:
 This patch makes CLUSTER MVCC-safe. Visibility information and update 
 chains are preserved like in VACUUM FULL.

Sounds good.

 CLUSTER is currently the only user of the facility, but I'm envisioning 
 we might have other users in the future. For example, a version of 
 VACUUM FULL that rewrites the whole table. 

I would be very much in favour of that, as discussed on -hackers.

There might be some requirement for the older VACUUM FULL behaviour, so
I'd like to suggest the syntax:

VACUUM FULL tablename [REPLACE | PRESERVE [STORAGE]];

where VACUUM FULL foo REPLACE STORAGE would be the new default, using
your new functions, and PRESERVE STORAGE would implement the old method.

 One complication in the implementation was the fact that heap_insert 
 overwrites the visibility information, and it doesn't write the full 
 tuple header to WAL. I ended up implementing a special-purpose 
 raw_heap_insert function instead, which is optimized for bulk inserting 
 a lot of tuples, knowing that we have exclusive access to the heap. 
 raw_heap_insert keeps the current buffer locked over calls, until it 
 gets full, and inserts the whole page to WAL as a single record using 
 the existing XLOG_HEAP_NEWPAGE record type.

I submitted Fast CLUSTER patch earlier which avoided writing WAL in the
same way that has been done for COPY, CREATE INDEX and CTAS. Would you
like to update your patch to do this also, or would you like me to
re-write the patch to fit with yours?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-21 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Heikki Linnakangas wrote:
 This patch makes CLUSTER MVCC-safe. Visibility information and update 
 chains are preserved like in VACUUM FULL.
 
 I created a new generic rewriteheap-facility to handle rewriting tables 
 in a visibility-preserving manner. All the update chain tracking is done 
 in rewriteheap.c, the caller is responsible for supplying the stream of 
 tuples.
 
 CLUSTER is currently the only user of the facility, but I'm envisioning 
 we might have other users in the future. For example, a version of 
 VACUUM FULL that rewrites the whole table. We could also use it to make 
 ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
 what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.
 
 One complication in the implementation was the fact that heap_insert 
 overwrites the visibility information, and it doesn't write the full 
 tuple header to WAL. I ended up implementing a special-purpose 
 raw_heap_insert function instead, which is optimized for bulk inserting 
 a lot of tuples, knowing that we have exclusive access to the heap. 
 raw_heap_insert keeps the current buffer locked over calls, until it 
 gets full, and inserts the whole page to WAL as a single record using 
 the existing XLOG_HEAP_NEWPAGE record type.
 
 This makes CLUSTER a more viable alternative to VACUUM FULL. One 
 motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
 pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
 will appear to be empty in the dump file.
 
 The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
 I suppose there's no need to touch the docs. I sent a doc patch earlier 
 to add a note about it, that doc patch should still be applied to older 
 release branches, IMO.
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com


 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PATCHES] Make CLUSTER MVCC-safe

2007-03-20 Thread Heikki Linnakangas
This patch makes CLUSTER MVCC-safe. Visibility information and update 
chains are preserved like in VACUUM FULL.


I created a new generic rewriteheap-facility to handle rewriting tables 
in a visibility-preserving manner. All the update chain tracking is done 
in rewriteheap.c, the caller is responsible for supplying the stream of 
tuples.


CLUSTER is currently the only user of the facility, but I'm envisioning 
we might have other users in the future. For example, a version of 
VACUUM FULL that rewrites the whole table. We could also use it to make 
ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.


One complication in the implementation was the fact that heap_insert 
overwrites the visibility information, and it doesn't write the full 
tuple header to WAL. I ended up implementing a special-purpose 
raw_heap_insert function instead, which is optimized for bulk inserting 
a lot of tuples, knowing that we have exclusive access to the heap. 
raw_heap_insert keeps the current buffer locked over calls, until it 
gets full, and inserts the whole page to WAL as a single record using 
the existing XLOG_HEAP_NEWPAGE record type.


This makes CLUSTER a more viable alternative to VACUUM FULL. One 
motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
will appear to be empty in the dump file.


The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
I suppose there's no need to touch the docs. I sent a doc patch earlier 
to add a note about it, that doc patch should still be applied to older 
release branches, IMO.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/heap/heapam.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.228
diff -c -r1.228 heapam.c
*** src/backend/access/heap/heapam.c	9 Feb 2007 03:35:33 -	1.228
--- src/backend/access/heap/heapam.c	20 Mar 2007 17:55:16 -
***
*** 3284,3289 
--- 3284,3326 
  	return log_heap_update(reln, oldbuf, from, newbuf, newtup, true);
  }
  
+ /*
+  * Perofrm XLogInsert of a full page image to WAL. Caller must make sure
+  * the page contents on disk are consistent with the page inserted to WAL.
+  */
+ XLogRecPtr
+ log_newpage(RelFileNode *rnode, BlockNumber blkno, Page page)
+ {
+ 	xl_heap_newpage xlrec;
+ 	XLogRecPtr	recptr;
+ 	XLogRecData rdata[2];
+ 
+ 	/* NO ELOG(ERROR) from here till newpage op is logged */
+ 	START_CRIT_SECTION();
+ 
+ 	xlrec.node = *rnode;
+ 	xlrec.blkno = blkno;
+ 
+ 	rdata[0].data = (char *) xlrec;
+ 	rdata[0].len = SizeOfHeapNewpage;
+ 	rdata[0].buffer = InvalidBuffer;
+ 	rdata[0].next = (rdata[1]);
+ 
+ 	rdata[1].data = (char *) page;
+ 	rdata[1].len = BLCKSZ;
+ 	rdata[1].buffer = InvalidBuffer;
+ 	rdata[1].next = NULL;
+ 
+ 	recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
+ 
+ 	PageSetLSN(page, recptr);
+ 	PageSetTLI(page, ThisTimeLineID);
+ 
+ 	END_CRIT_SECTION();
+ 
+ 	return recptr;
+ }
+ 
  static void
  heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record)
  {
Index: src/backend/access/nbtree/nbtsort.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtsort.c,v
retrieving revision 1.110
diff -c -r1.110 nbtsort.c
*** src/backend/access/nbtree/nbtsort.c	9 Jan 2007 02:14:10 -	1.110
--- src/backend/access/nbtree/nbtsort.c	20 Mar 2007 17:58:20 -
***
*** 64,69 
--- 64,70 
  
  #include postgres.h
  
+ #include access/heapam.h
  #include access/nbtree.h
  #include miscadmin.h
  #include storage/smgr.h
***
*** 265,296 
  	if (wstate-btws_use_wal)
  	{
  		/* We use the heap NEWPAGE record type for this */
! 		xl_heap_newpage xlrec;
! 		XLogRecPtr	recptr;
! 		XLogRecData rdata[2];
! 
! 		/* NO ELOG(ERROR) from here till newpage op is logged */
! 		START_CRIT_SECTION();
! 
! 		xlrec.node = wstate-index-rd_node;
! 		xlrec.blkno = blkno;
! 
! 		rdata[0].data = (char *) xlrec;
! 		rdata[0].len = SizeOfHeapNewpage;
! 		rdata[0].buffer = InvalidBuffer;
! 		rdata[0].next = (rdata[1]);
! 
! 		rdata[1].data = (char *) page;
! 		rdata[1].len = BLCKSZ;
! 		rdata[1].buffer = InvalidBuffer;
! 		rdata[1].next = NULL;
! 
! 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
! 
! 		PageSetLSN(page, recptr);
! 		PageSetTLI(page, ThisTimeLineID);
! 
! 		END_CRIT_SECTION();
  	}
  	else
  	{
--- 266,272 
  	if (wstate-btws_use_wal)
  	{
  		/* We use the heap NEWPAGE record type for this */
! 		log_newpage(wstate-index-rd_node, blkno, page);
  	}
  	else
  	{
Index: src/backend/commands/Makefile
===
RCS file: