During recent tuning of the TPC-C workload, I produced the following
patch to force COPY to leave some space in each data block when it loads
data into heap relations.

Rationale for this was to improve performance of random UPDATE
statements against a table too large to fit in memory. That activity is
joint-equal most frequent single statement type. [tpcc.Stock table, for
those who know the workload]. 

Each UPDATE tries to insert a new row version. The blocks are all full,
so each UPDATE causes this activity:
        read block, decide cannot insert, unlock block
        extend relation by 1 block
        relock first block, lock new block
        insert row into new block
After the patch, most UPDATEs cause only a single block access/update,
excluding the index effects.

This patch is essentially the "bottom-half" of what is needed to make
FILLFACTOR work for heap relations. There is no syntax/command
statements coded at this time...a recoded executable was used to reload
the STOCK table before testing with a fixed padding of 1000 bytes, or
roughly FILLFACTOR 88.

Index relations might also be added, though it was less clear to me at
the time I wrote the patch that such an approach would bring performance
improvement.

The main discussion now is probably where the FILLFACTOR should be
specified. Should it be:
a)      COPY table FROM .... WITH FILLFACTOR 90
which would fill datablocks for *this copy only* at 90% full

b)      ALTER TABLE table SET FILLFACTOR 90
which would set the table so that all COPY commands used to load data
into it would load heap blocks at 90% full

Implementing an extension to the COPY command seemed easier than trying
to fiddle with the system catalogs to allow this extra factor be
recorded for ever, when it is only used in conjunction with a COPY
statement anyway. 

The patch was produced quickly to assist tuning efforts during
Scalability & Performance benchmarking of PostgreSQL 8.0 carried out at
Unisys Corporation's Mission Viejo engineering laboratory. The
development was sponsored by Unisys Corporation and the patch has now
been donated to the PostgreSQL community under the standard
PostgreSQL/BSD licence. Approval for release of this code has been given
in writing to me by the Director, Open Runtime Products, Unisys on April
8, 2005.

If anybody would like to complete the work started here, I'd be most
grateful - there are some other items I need to work on for 8.1 ahead of
this now.

Best Regards, Simon Riggs
diff -c -r postgresql-8.0.1/src/backend/access/heap/heapam.c postgresql-8.0.1-wip/src/backend/access/heap/heapam.c
*** postgresql-8.0.1/src/backend/access/heap/heapam.c	2005-02-03 10:38:06.000000000 -0800
--- postgresql-8.0.1-wip/src/backend/access/heap/heapam.c	2005-02-10 04:50:33.832304712 -0800
***************
*** 1101,1107 ****
   * command ID.
   */
  Oid
! heap_insert(Relation relation, HeapTuple tup, CommandId cid)
  {
  	TransactionId xid = GetCurrentTransactionId();
  	Buffer		buffer;
--- 1101,1107 ----
   * command ID.
   */
  Oid
! heap_insert(Relation relation, HeapTuple tup, CommandId cid, Size requiredFreespace)
  {
  	TransactionId xid = GetCurrentTransactionId();
  	Buffer		buffer;
***************
*** 1151,1157 ****
  		heap_tuple_toast_attrs(relation, tup, NULL);
  
  	/* Find buffer to insert this tuple into */
! 	buffer = RelationGetBufferForTuple(relation, tup->t_len, InvalidBuffer);
  
  	/* NO EREPORT(ERROR) from here till changes are logged */
  	START_CRIT_SECTION();
--- 1151,1157 ----
  		heap_tuple_toast_attrs(relation, tup, NULL);
  
  	/* Find buffer to insert this tuple into */
! 	buffer = RelationGetBufferForTuple(relation, tup->t_len, InvalidBuffer, requiredFreespace);
  
  	/* NO EREPORT(ERROR) from here till changes are logged */
  	START_CRIT_SECTION();
***************
*** 1242,1255 ****
   * Currently, this routine differs from heap_insert only in supplying
   * a default command ID.  But it should be used rather than using
   * heap_insert directly in most places where we are modifying system catalogs.
   */
  Oid
  simple_heap_insert(Relation relation, HeapTuple tup)
  {
! 	return heap_insert(relation, tup, GetCurrentCommandId());
  }
  
  /*
   *	heap_delete		- delete a tuple
   *
   * NB: do not call this directly unless you are prepared to deal with
--- 1242,1273 ----
   * Currently, this routine differs from heap_insert only in supplying
   * a default command ID.  But it should be used rather than using
   * heap_insert directly in most places where we are modifying system catalogs.
+  * 
+  * Leave no freespace in block when inserting...
   */
  Oid
  simple_heap_insert(Relation relation, HeapTuple tup)
  {
! 	return heap_insert(relation, tup, GetCurrentCommandId(), zeroFreespace);
  }
  
  /*
+  *	partialfill_heap_insert - insert a tuple
+  *
+  * Similar to simple_heap_insert, except for...
+  * 
+  * Caller decides how much freespace to leave in page...
+  */
+ Oid
+ partialfill_heap_insert(Relation relation, HeapTuple tup, Size requiredFreespace)
+ {
+ 	return heap_insert(relation, tup, GetCurrentCommandId(), requiredFreespace);
+ }
+ 
+ 
+ /*
   *	heap_delete		- delete a tuple
   *
   * NB: do not call this directly unless you are prepared to deal with
***************
*** 1672,1678 ****
  		{
  			/* Assume there's no chance to put newtup on same page. */
  			newbuf = RelationGetBufferForTuple(relation, newtup->t_len,
! 											   buffer);
  		}
  		else
  		{
--- 1690,1696 ----
  		{
  			/* Assume there's no chance to put newtup on same page. */
  			newbuf = RelationGetBufferForTuple(relation, newtup->t_len,
! 											   buffer, zeroFreespace);
  		}
  		else
  		{
***************
*** 1689,1695 ****
  				 */
  				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
  				newbuf = RelationGetBufferForTuple(relation, newtup->t_len,
! 												   buffer);
  			}
  			else
  			{
--- 1707,1713 ----
  				 */
  				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
  				newbuf = RelationGetBufferForTuple(relation, newtup->t_len,
! 												   buffer, zeroFreespace);
  			}
  			else
  			{
diff -c -r postgresql-8.0.1/src/backend/access/heap/hio.c postgresql-8.0.1-wip/src/backend/access/heap/hio.c
*** postgresql-8.0.1/src/backend/access/heap/hio.c	2005-02-03 10:38:06.000000000 -0800
--- postgresql-8.0.1-wip/src/backend/access/heap/hio.c	2005-02-10 03:54:57.000000000 -0800
***************
*** 89,95 ****
   */
  Buffer
  RelationGetBufferForTuple(Relation relation, Size len,
! 						  Buffer otherBuffer)
  {
  	Buffer		buffer = InvalidBuffer;
  	Page		pageHeader;
--- 89,95 ----
   */
  Buffer
  RelationGetBufferForTuple(Relation relation, Size len,
! 						  Buffer otherBuffer, Size requiredFreespace)
  {
  	Buffer		buffer = InvalidBuffer;
  	Page		pageHeader;
***************
*** 192,198 ****
  		 */
  		pageHeader = (Page) BufferGetPage(buffer);
  		pageFreeSpace = PageGetFreeSpace(pageHeader);
! 		if (len <= pageFreeSpace)
  		{
  			/* use this page as future insert target, too */
  			relation->rd_targblock = targetBlock;
--- 192,198 ----
  		 */
  		pageHeader = (Page) BufferGetPage(buffer);
  		pageFreeSpace = PageGetFreeSpace(pageHeader);
! 		if (len <= (pageFreeSpace - requiredFreespace))
  		{
  			/* use this page as future insert target, too */
  			relation->rd_targblock = targetBlock;
diff -c -r postgresql-8.0.1/src/backend/commands/copy.c postgresql-8.0.1-wip/src/backend/commands/copy.c
*** postgresql-8.0.1/src/backend/commands/copy.c	2005-02-03 10:38:07.000000000 -0800
--- postgresql-8.0.1-wip/src/backend/commands/copy.c	2005-02-10 05:23:04.416770800 -0800
***************
*** 1953,1959 ****
  			/*
  			 * OK, store the tuple and create index entries for it
  			 */
! 			simple_heap_insert(rel, tuple);
  
  			if (resultRelInfo->ri_NumIndices > 0)
  				ExecInsertIndexTuples(slot, &(tuple->t_self), estate, false);
--- 1955,1961 ----
  			/*
  			 * OK, store the tuple and create index entries for it
  			 */
! 			partialfill_heap_insert(rel, tuple, copyFreespace);
  
  			if (resultRelInfo->ri_NumIndices > 0)
  				ExecInsertIndexTuples(slot, &(tuple->t_self), estate, false);
diff -c -r postgresql-8.0.1/src/backend/executor/execMain.c postgresql-8.0.1-wip/src/backend/executor/execMain.c
*** postgresql-8.0.1/src/backend/executor/execMain.c	2005-02-03 10:38:07.000000000 -0800
--- postgresql-8.0.1-wip/src/backend/executor/execMain.c	2005-02-10 04:58:01.200294440 -0800
***************
*** 1293,1299 ****
  	if (estate->es_into_relation_descriptor != NULL)
  	{
  		heap_insert(estate->es_into_relation_descriptor, tuple,
! 					estate->es_snapshot->curcid);
  		IncrAppended();
  	}
  
--- 1293,1299 ----
  	if (estate->es_into_relation_descriptor != NULL)
  	{
  		heap_insert(estate->es_into_relation_descriptor, tuple,
! 					estate->es_snapshot->curcid, zeroFreespace);
  		IncrAppended();
  	}
  
***************
*** 1369,1375 ****
  	 * insert the tuple
  	 */
  	newId = heap_insert(resultRelationDesc, tuple,
! 						estate->es_snapshot->curcid);
  
  	IncrAppended();
  	(estate->es_processed)++;
--- 1369,1375 ----
  	 * insert the tuple
  	 */
  	newId = heap_insert(resultRelationDesc, tuple,
! 						estate->es_snapshot->curcid, zeroFreespace);
  
  	IncrAppended();
  	(estate->es_processed)++;
diff -c -r postgresql-8.0.1/src/include/access/heapam.h postgresql-8.0.1-wip/src/include/access/heapam.h
*** postgresql-8.0.1/src/include/access/heapam.h	2005-02-03 10:38:12.000000000 -0800
--- postgresql-8.0.1-wip/src/include/access/heapam.h	2005-02-10 04:48:20.998498520 -0800
***************
*** 158,164 ****
  					ItemPointer tid);
  extern void setLastTid(const ItemPointer tid);
  
! extern Oid	heap_insert(Relation relation, HeapTuple tup, CommandId cid);
  extern int heap_delete(Relation relation, ItemPointer tid, ItemPointer ctid,
  			CommandId cid, Snapshot crosscheck, bool wait);
  extern int heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
--- 158,164 ----
  					ItemPointer tid);
  extern void setLastTid(const ItemPointer tid);
  
! extern Oid	heap_insert(Relation relation, HeapTuple tup, CommandId cid, Size requiredFreespace);
  extern int heap_delete(Relation relation, ItemPointer tid, ItemPointer ctid,
  			CommandId cid, Snapshot crosscheck, bool wait);
  extern int heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
***************
*** 166,171 ****
--- 166,172 ----
  extern int heap_mark4update(Relation relation, HeapTuple tup,
  				 Buffer *userbuf, CommandId cid);
  
+ extern Oid  partialfill_heap_insert(Relation relation, HeapTuple tup, Size requiredFreespace);
  extern Oid	simple_heap_insert(Relation relation, HeapTuple tup);
  extern void simple_heap_delete(Relation relation, ItemPointer tid);
  extern void simple_heap_update(Relation relation, ItemPointer otid,
***************
*** 205,208 ****
--- 206,212 ----
  extern void heap_freetuple(HeapTuple tuple);
  extern HeapTuple heap_addheader(int natts, bool withoid, Size structlen, void *structure);
  
+ #define copyFreespace ((Size) 0x000003FF)       /* 1023 bytes */
+ #define zeroFreespace ((Size) 0x00000000)       /* 0 bytes */
+ 
  #endif   /* HEAPAM_H */
diff -c -r postgresql-8.0.1/src/include/access/hio.h postgresql-8.0.1-wip/src/include/access/hio.h
*** postgresql-8.0.1/src/include/access/hio.h	2005-02-03 10:38:12.000000000 -0800
--- postgresql-8.0.1-wip/src/include/access/hio.h	2005-02-10 03:52:43.000000000 -0800
***************
*** 19,24 ****
  extern void RelationPutHeapTuple(Relation relation, Buffer buffer,
  					 HeapTuple tuple);
  extern Buffer RelationGetBufferForTuple(Relation relation, Size len,
! 						  Buffer otherBuffer);
  
  #endif   /* HIO_H */
--- 19,24 ----
  extern void RelationPutHeapTuple(Relation relation, Buffer buffer,
  					 HeapTuple tuple);
  extern Buffer RelationGetBufferForTuple(Relation relation, Size len,
! 						  Buffer otherBuffer, Size requiredFreespace);
  
  #endif   /* HIO_H */
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to