Re: [PATCHES] Make CLUSTER MVCC-safe

2007-04-07 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
>> This patch makes CLUSTER MVCC-safe. Visibility information and update 
>> chains are preserved like in VACUUM FULL.

> Here's an update, fixing conflict by Tom's recent commit of Simon's 
> patch to skip WAL-inserts when archiving is not enabled.

Applied with revisions.  There were some bugs in it: you need to check
both xmin and tid when determining if one tuple chains to another,
and you can't separate MarkBufferDirty from the critical section that
writes xlog.  (I got around that by not keeping the working page in
buffers at all, the same way btree index build works; should be a bit
faster as well as more correct.)  It had some memory leakage too.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-04-02 Thread Bruce Momjian
Heikki Linnakangas wrote:
> Alvaro Herrera wrote:
> > Heikki Linnakangas wrote:
> >> Here's an update, fixing conflict by Tom's recent commit of Simon's 
> >> patch to skip WAL-inserts when archiving is not enabled.
> > 
> > Hmm, wouldn't it be better if the rewriteheap.c file was in
> > access/heap/ instead of commands/?
> 
> Yeah, maybe. I thought of it as a subsystem of cluster, and possibly 
> other commands like alter table. But it really is pretty low-level.

So, does the patch need adjustment?  Should we move it before
application?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-30 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Heikki Linnakangas wrote:
Here's an update, fixing conflict by Tom's recent commit of Simon's 
patch to skip WAL-inserts when archiving is not enabled.


Hmm, wouldn't it be better if the rewriteheap.c file was in
access/heap/ instead of commands/?


Yeah, maybe. I thought of it as a subsystem of cluster, and possibly 
other commands like alter table. But it really is pretty low-level.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-29 Thread Alvaro Herrera
Heikki Linnakangas wrote:
> Here's an update, fixing conflict by Tom's recent commit of Simon's 
> patch to skip WAL-inserts when archiving is not enabled.

Hmm, wouldn't it be better if the rewriteheap.c file was in
access/heap/ instead of commands/?



-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-29 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Heikki Linnakangas wrote:
> Here's an update, fixing conflict by Tom's recent commit of Simon's 
> patch to skip WAL-inserts when archiving is not enabled.
> 
> Heikki Linnakangas wrote:
> > This patch makes CLUSTER MVCC-safe. Visibility information and update 
> > chains are preserved like in VACUUM FULL.
> > 
> > I created a new generic rewriteheap-facility to handle rewriting tables 
> > in a visibility-preserving manner. All the update chain tracking is done 
> > in rewriteheap.c, the caller is responsible for supplying the stream of 
> > tuples.
> > 
> > CLUSTER is currently the only user of the facility, but I'm envisioning 
> > we might have other users in the future. For example, a version of 
> > VACUUM FULL that rewrites the whole table. We could also use it to make 
> > ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
> > what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.
> > 
> > One complication in the implementation was the fact that heap_insert 
> > overwrites the visibility information, and it doesn't write the full 
> > tuple header to WAL. I ended up implementing a special-purpose 
> > raw_heap_insert function instead, which is optimized for bulk inserting 
> > a lot of tuples, knowing that we have exclusive access to the heap. 
> > raw_heap_insert keeps the current buffer locked over calls, until it 
> > gets full, and inserts the whole page to WAL as a single record using 
> > the existing XLOG_HEAP_NEWPAGE record type.
> > 
> > This makes CLUSTER a more viable alternative to VACUUM FULL. One 
> > motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
> > pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
> > will appear to be empty in the dump file.
> > 
> > The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
> > I suppose there's no need to touch the docs. I sent a doc patch earlier 
> > to add a note about it, that doc patch should still be applied to older 
> > release branches, IMO.
> 
> -- 
>Heikki Linnakangas
>EnterpriseDB   http://www.enterprisedb.com


> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-29 Thread Heikki Linnakangas
Here's an update, fixing conflict by Tom's recent commit of Simon's 
patch to skip WAL-inserts when archiving is not enabled.


Heikki Linnakangas wrote:
This patch makes CLUSTER MVCC-safe. Visibility information and update 
chains are preserved like in VACUUM FULL.


I created a new generic rewriteheap-facility to handle rewriting tables 
in a visibility-preserving manner. All the update chain tracking is done 
in rewriteheap.c, the caller is responsible for supplying the stream of 
tuples.


CLUSTER is currently the only user of the facility, but I'm envisioning 
we might have other users in the future. For example, a version of 
VACUUM FULL that rewrites the whole table. We could also use it to make 
ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.


One complication in the implementation was the fact that heap_insert 
overwrites the visibility information, and it doesn't write the full 
tuple header to WAL. I ended up implementing a special-purpose 
raw_heap_insert function instead, which is optimized for bulk inserting 
a lot of tuples, knowing that we have exclusive access to the heap. 
raw_heap_insert keeps the current buffer locked over calls, until it 
gets full, and inserts the whole page to WAL as a single record using 
the existing XLOG_HEAP_NEWPAGE record type.


This makes CLUSTER a more viable alternative to VACUUM FULL. One 
motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
will appear to be empty in the dump file.


The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
I suppose there's no need to touch the docs. I sent a doc patch earlier 
to add a note about it, that doc patch should still be applied to older 
release branches, IMO.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/heap/heapam.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.230
diff -c -r1.230 heapam.c
*** src/backend/access/heap/heapam.c	29 Mar 2007 00:15:37 -	1.230
--- src/backend/access/heap/heapam.c	29 Mar 2007 08:27:21 -
***
*** 3280,3285 
--- 3280,3322 
  	return log_heap_update(reln, oldbuf, from, newbuf, newtup, true);
  }
  
+ /*
+  * Perofrm XLogInsert of a full page image to WAL. Caller must make sure
+  * the page contents on disk are consistent with the page inserted to WAL.
+  */
+ XLogRecPtr
+ log_newpage(RelFileNode *rnode, BlockNumber blkno, Page page)
+ {
+ 	xl_heap_newpage xlrec;
+ 	XLogRecPtr	recptr;
+ 	XLogRecData rdata[2];
+ 
+ 	/* NO ELOG(ERROR) from here till newpage op is logged */
+ 	START_CRIT_SECTION();
+ 
+ 	xlrec.node = *rnode;
+ 	xlrec.blkno = blkno;
+ 
+ 	rdata[0].data = (char *) &xlrec;
+ 	rdata[0].len = SizeOfHeapNewpage;
+ 	rdata[0].buffer = InvalidBuffer;
+ 	rdata[0].next = &(rdata[1]);
+ 
+ 	rdata[1].data = (char *) page;
+ 	rdata[1].len = BLCKSZ;
+ 	rdata[1].buffer = InvalidBuffer;
+ 	rdata[1].next = NULL;
+ 
+ 	recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
+ 
+ 	PageSetLSN(page, recptr);
+ 	PageSetTLI(page, ThisTimeLineID);
+ 
+ 	END_CRIT_SECTION();
+ 
+ 	return recptr;
+ }
+ 
  static void
  heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record)
  {
Index: src/backend/access/nbtree/nbtsort.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtsort.c,v
retrieving revision 1.110
diff -c -r1.110 nbtsort.c
*** src/backend/access/nbtree/nbtsort.c	9 Jan 2007 02:14:10 -	1.110
--- src/backend/access/nbtree/nbtsort.c	20 Mar 2007 17:58:20 -
***
*** 64,69 
--- 64,70 
  
  #include "postgres.h"
  
+ #include "access/heapam.h"
  #include "access/nbtree.h"
  #include "miscadmin.h"
  #include "storage/smgr.h"
***
*** 265,296 
  	if (wstate->btws_use_wal)
  	{
  		/* We use the heap NEWPAGE record type for this */
! 		xl_heap_newpage xlrec;
! 		XLogRecPtr	recptr;
! 		XLogRecData rdata[2];
! 
! 		/* NO ELOG(ERROR) from here till newpage op is logged */
! 		START_CRIT_SECTION();
! 
! 		xlrec.node = wstate->index->rd_node;
! 		xlrec.blkno = blkno;
! 
! 		rdata[0].data = (char *) &xlrec;
! 		rdata[0].len = SizeOfHeapNewpage;
! 		rdata[0].buffer = InvalidBuffer;
! 		rdata[0].next = &(rdata[1]);
! 
! 		rdata[1].data = (char *) page;
! 		rdata[1].len = BLCKSZ;
! 		rdata[1].buffer = InvalidBuffer;
! 		rdata[1].next = NULL;
! 
! 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
! 
! 		PageSetLSN(page, recptr);
! 		PageSetTLI(page, ThisTimeLineID);
! 
! 		END_CRIT_SECTION();
  	}
  	else
  	{
--- 266,272 
  	if (wstate->btws_use_wal)
  	{
  		/* We use the heap NEWPAGE record type for this */
! 		log_newpage(&wstate->index->rd_node, blkno, pa

Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-21 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Heikki Linnakangas wrote:
> This patch makes CLUSTER MVCC-safe. Visibility information and update 
> chains are preserved like in VACUUM FULL.
> 
> I created a new generic rewriteheap-facility to handle rewriting tables 
> in a visibility-preserving manner. All the update chain tracking is done 
> in rewriteheap.c, the caller is responsible for supplying the stream of 
> tuples.
> 
> CLUSTER is currently the only user of the facility, but I'm envisioning 
> we might have other users in the future. For example, a version of 
> VACUUM FULL that rewrites the whole table. We could also use it to make 
> ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
> what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.
> 
> One complication in the implementation was the fact that heap_insert 
> overwrites the visibility information, and it doesn't write the full 
> tuple header to WAL. I ended up implementing a special-purpose 
> raw_heap_insert function instead, which is optimized for bulk inserting 
> a lot of tuples, knowing that we have exclusive access to the heap. 
> raw_heap_insert keeps the current buffer locked over calls, until it 
> gets full, and inserts the whole page to WAL as a single record using 
> the existing XLOG_HEAP_NEWPAGE record type.
> 
> This makes CLUSTER a more viable alternative to VACUUM FULL. One 
> motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
> pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
> will appear to be empty in the dump file.
> 
> The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
> I suppose there's no need to touch the docs. I sent a doc patch earlier 
> to add a note about it, that doc patch should still be applied to older 
> release branches, IMO.
> 
> -- 
>Heikki Linnakangas
>EnterpriseDB   http://www.enterprisedb.com


> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Make CLUSTER MVCC-safe

2007-03-21 Thread Simon Riggs
On Tue, 2007-03-20 at 18:20 +, Heikki Linnakangas wrote:
> This patch makes CLUSTER MVCC-safe. Visibility information and update 
> chains are preserved like in VACUUM FULL.

Sounds good.

> CLUSTER is currently the only user of the facility, but I'm envisioning 
> we might have other users in the future. For example, a version of 
> VACUUM FULL that rewrites the whole table. 

I would be very much in favour of that, as discussed on -hackers.

There might be some requirement for the older VACUUM FULL behaviour, so
I'd like to suggest the syntax:

VACUUM FULL tablename [REPLACE | PRESERVE [STORAGE]];

where VACUUM FULL foo REPLACE STORAGE would be the new default, using
your new functions, and PRESERVE STORAGE would implement the old method.

> One complication in the implementation was the fact that heap_insert 
> overwrites the visibility information, and it doesn't write the full 
> tuple header to WAL. I ended up implementing a special-purpose 
> raw_heap_insert function instead, which is optimized for bulk inserting 
> a lot of tuples, knowing that we have exclusive access to the heap. 
> raw_heap_insert keeps the current buffer locked over calls, until it 
> gets full, and inserts the whole page to WAL as a single record using 
> the existing XLOG_HEAP_NEWPAGE record type.

I submitted Fast CLUSTER patch earlier which avoided writing WAL in the
same way that has been done for COPY, CREATE INDEX and CTAS. Would you
like to update your patch to do this also, or would you like me to
re-write the patch to fit with yours?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PATCHES] Make CLUSTER MVCC-safe

2007-03-20 Thread Heikki Linnakangas
This patch makes CLUSTER MVCC-safe. Visibility information and update 
chains are preserved like in VACUUM FULL.


I created a new generic rewriteheap-facility to handle rewriting tables 
in a visibility-preserving manner. All the update chain tracking is done 
in rewriteheap.c, the caller is responsible for supplying the stream of 
tuples.


CLUSTER is currently the only user of the facility, but I'm envisioning 
we might have other users in the future. For example, a version of 
VACUUM FULL that rewrites the whole table. We could also use it to make 
ALTER TABLE MVCC-safe, but there's some issues with that. For example, 
what to do if RECENTLY_DEAD tuples don't satisfy a newly added constraint.


One complication in the implementation was the fact that heap_insert 
overwrites the visibility information, and it doesn't write the full 
tuple header to WAL. I ended up implementing a special-purpose 
raw_heap_insert function instead, which is optimized for bulk inserting 
a lot of tuples, knowing that we have exclusive access to the heap. 
raw_heap_insert keeps the current buffer locked over calls, until it 
gets full, and inserts the whole page to WAL as a single record using 
the existing XLOG_HEAP_NEWPAGE record type.


This makes CLUSTER a more viable alternative to VACUUM FULL. One 
motivation for making CLUSTER MVCC-safe is that if some poor soul runs 
pg_dump to make a backup concurrently with CLUSTER, the clustered tables 
will appear to be empty in the dump file.


The documentation doesn't anything about CLUSTER not being MVCC-safe, so 
I suppose there's no need to touch the docs. I sent a doc patch earlier 
to add a note about it, that doc patch should still be applied to older 
release branches, IMO.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/heap/heapam.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.228
diff -c -r1.228 heapam.c
*** src/backend/access/heap/heapam.c	9 Feb 2007 03:35:33 -	1.228
--- src/backend/access/heap/heapam.c	20 Mar 2007 17:55:16 -
***
*** 3284,3289 
--- 3284,3326 
  	return log_heap_update(reln, oldbuf, from, newbuf, newtup, true);
  }
  
+ /*
+  * Perofrm XLogInsert of a full page image to WAL. Caller must make sure
+  * the page contents on disk are consistent with the page inserted to WAL.
+  */
+ XLogRecPtr
+ log_newpage(RelFileNode *rnode, BlockNumber blkno, Page page)
+ {
+ 	xl_heap_newpage xlrec;
+ 	XLogRecPtr	recptr;
+ 	XLogRecData rdata[2];
+ 
+ 	/* NO ELOG(ERROR) from here till newpage op is logged */
+ 	START_CRIT_SECTION();
+ 
+ 	xlrec.node = *rnode;
+ 	xlrec.blkno = blkno;
+ 
+ 	rdata[0].data = (char *) &xlrec;
+ 	rdata[0].len = SizeOfHeapNewpage;
+ 	rdata[0].buffer = InvalidBuffer;
+ 	rdata[0].next = &(rdata[1]);
+ 
+ 	rdata[1].data = (char *) page;
+ 	rdata[1].len = BLCKSZ;
+ 	rdata[1].buffer = InvalidBuffer;
+ 	rdata[1].next = NULL;
+ 
+ 	recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
+ 
+ 	PageSetLSN(page, recptr);
+ 	PageSetTLI(page, ThisTimeLineID);
+ 
+ 	END_CRIT_SECTION();
+ 
+ 	return recptr;
+ }
+ 
  static void
  heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record)
  {
Index: src/backend/access/nbtree/nbtsort.c
===
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtsort.c,v
retrieving revision 1.110
diff -c -r1.110 nbtsort.c
*** src/backend/access/nbtree/nbtsort.c	9 Jan 2007 02:14:10 -	1.110
--- src/backend/access/nbtree/nbtsort.c	20 Mar 2007 17:58:20 -
***
*** 64,69 
--- 64,70 
  
  #include "postgres.h"
  
+ #include "access/heapam.h"
  #include "access/nbtree.h"
  #include "miscadmin.h"
  #include "storage/smgr.h"
***
*** 265,296 
  	if (wstate->btws_use_wal)
  	{
  		/* We use the heap NEWPAGE record type for this */
! 		xl_heap_newpage xlrec;
! 		XLogRecPtr	recptr;
! 		XLogRecData rdata[2];
! 
! 		/* NO ELOG(ERROR) from here till newpage op is logged */
! 		START_CRIT_SECTION();
! 
! 		xlrec.node = wstate->index->rd_node;
! 		xlrec.blkno = blkno;
! 
! 		rdata[0].data = (char *) &xlrec;
! 		rdata[0].len = SizeOfHeapNewpage;
! 		rdata[0].buffer = InvalidBuffer;
! 		rdata[0].next = &(rdata[1]);
! 
! 		rdata[1].data = (char *) page;
! 		rdata[1].len = BLCKSZ;
! 		rdata[1].buffer = InvalidBuffer;
! 		rdata[1].next = NULL;
! 
! 		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
! 
! 		PageSetLSN(page, recptr);
! 		PageSetTLI(page, ThisTimeLineID);
! 
! 		END_CRIT_SECTION();
  	}
  	else
  	{
--- 266,272 
  	if (wstate->btws_use_wal)
  	{
  		/* We use the heap NEWPAGE record type for this */
! 		log_newpage(&wstate->index->rd_node, blkno, page);
  	}
  	else
  	{
Index: src/backend/commands/Makefile
===
RCS file: /home/hlinnaka/pg