Re: [PATCHES] COPY LOCK for WAL bypass

2005-12-21 Thread Bruce Momjian
Simon Riggs wrote:
 On Sun, 2005-12-18 at 22:03 -0500, Chris Browne wrote:
  [EMAIL PROTECTED] (Simon Riggs) writes:
   On Sat, 2005-12-10 at 12:07 +, Simon Riggs wrote:
   Following patch implements COPY ... FROM ... LOCK
  
   Patch now updated so that it includes an additional optimization of
   COPY, so that WAL will not be written in the transaction that created
   the table.
  
   This now gives two fast paths for COPY:
   1) COPY LOCK
   2) COPY in same transaction (e.g. reloading a pg_dump)
  
  I presume that if this doesn't go into WAL, that means that this kind
  of update wouldn't play with PITR, right?
 
 You're right.
 
 PITR is designed for normal production use, rather than initial loading.
 It is also fairly common to turn PITR off permanently in larger data
 warehouses, which is also where these optimizations are aimed.
 
 I'll take the hint and write the docs then. :-)

I assume in addition to a documentation mention, you will disable this
feature when PITR is on, right?  (Rather than just document something
that is unsafe, we disable it.)

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] COPY LOCK for WAL bypass

2005-12-21 Thread Bruce Momjian
Jim C. Nasby wrote:
 On Mon, Dec 19, 2005 at 07:51:54AM +, Simon Riggs wrote:
  On Sun, 2005-12-18 at 22:03 -0500, Chris Browne wrote:
   [EMAIL PROTECTED] (Simon Riggs) writes:
On Sat, 2005-12-10 at 12:07 +, Simon Riggs wrote:
Following patch implements COPY ... FROM ... LOCK
   
Patch now updated so that it includes an additional optimization of
COPY, so that WAL will not be written in the transaction that created
the table.
   
This now gives two fast paths for COPY:
1) COPY LOCK
2) COPY in same transaction (e.g. reloading a pg_dump)
   
   I presume that if this doesn't go into WAL, that means that this kind
   of update wouldn't play with PITR, right?
  
  You're right.
  
  PITR is designed for normal production use, rather than initial loading.
  It is also fairly common to turn PITR off permanently in larger data
  warehouses, which is also where these optimizations are aimed.
 
 Hrm... I'd say we need an option to disable the fast-copy then, in case
 you wanted the copy to make it into PITR. Or perhaps we just disallow
 the fast-copy when PITR is in use. I believe that's what other databases
 do...

Right, just disable with a clear error message.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] COPY LOCK for WAL bypass

2005-12-21 Thread Simon Riggs
On Wed, 2005-12-21 at 08:18 -0500, Bruce Momjian wrote:
 Jim C. Nasby wrote:
  On Mon, Dec 19, 2005 at 07:51:54AM +, Simon Riggs wrote:
   On Sun, 2005-12-18 at 22:03 -0500, Chris Browne wrote:
[EMAIL PROTECTED] (Simon Riggs) writes:
 On Sat, 2005-12-10 at 12:07 +, Simon Riggs wrote:
 Following patch implements COPY ... FROM ... LOCK

 Patch now updated so that it includes an additional optimization of
 COPY, so that WAL will not be written in the transaction that created
 the table.

 This now gives two fast paths for COPY:
 1) COPY LOCK
 2) COPY in same transaction (e.g. reloading a pg_dump)

I presume that if this doesn't go into WAL, that means that this kind
of update wouldn't play with PITR, right?
   
   You're right.
   
   PITR is designed for normal production use, rather than initial loading.
   It is also fairly common to turn PITR off permanently in larger data
   warehouses, which is also where these optimizations are aimed.
  
  Hrm... I'd say we need an option to disable the fast-copy then, in case
  you wanted the copy to make it into PITR. Or perhaps we just disallow
  the fast-copy when PITR is in use. I believe that's what other databases
  do...
 
 Right, just disable with a clear error message.
 

Forgive me if my earlier reply was not clear: This patch works just the
same way CREATE INDEX and CREATE TABLE AS SELECT already works:
- if the server has archive_command set, WAL is written
- if archive_command is not set then (assuming other conditions apply)
the writing WAL will be bypassed for performance.
There is no error-message generated, by COPY LOCK nor the other two
previously mentioned commands.

Writing docs now.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] COPY LOCK for WAL bypass

2005-12-21 Thread Simon Riggs
On Wed, 2005-12-21 at 16:23 +, Simon Riggs wrote:
 Writing docs now.

Here's the docs...

Hopefully that explains everything a little betterbut this doc patch
contains a ref to the as yet unapplied --single-transaction patch, but
also explains why that patch exists also.

[Patch via doc/src/sgml/ref rather than from tree top]

Best Regards, Simon Riggs
Index: copy.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.70
diff -c -r1.70 copy.sgml
*** copy.sgml	15 Oct 2005 20:12:33 -	1.70
--- copy.sgml	21 Dec 2005 17:04:08 -
***
*** 26,31 
--- 26,32 
  [ [ WITH ] 
[ BINARY ]
[ OIDS ]
+   [ LOCK ]
[ DELIMITER [ AS ] 'replaceable class=parameterdelimiter/replaceable' ]
[ NULL [ AS ] 'replaceable class=parameternull string/replaceable' ]
[ CSV [ HEADER ]
***
*** 39,44 
--- 40,46 
[ BINARY ]
[ HEADER ]
[ OIDS ]
+   [ LOCK ]
[ DELIMITER [ AS ] 'replaceable class=parameterdelimiter/replaceable' ]
[ NULL [ AS ] 'replaceable class=parameternull string/replaceable' ]
[ CSV [ HEADER ]
***
*** 152,157 
--- 154,176 
 /varlistentry
  
 varlistentry
+ termliteralLOCK/literal/term
+ listitem
+  para
+   An ExclusiveLock will be taken on the table prior to execution. This will 
+   prevent concurrent writes to the table, as well as preventing 
+   commandVACUUM/command, commandCREATE INDEX/command and other DDL 
+   commands.
+  /para
+  para
+   With commandCOPY FROM/command this option can be used to greatly 
+   increase performance of data loading, in certain conditions. 
+   (see Notes below).
+  /para
+ /listitem
+/varlistentry
+ 
+varlistentry
  termreplaceable class=parameterdelimiter/replaceable/term
  listitem
   para
***
*** 332,337 
--- 351,378 
  happened well into a large copy operation. You may wish to invoke
  commandVACUUM/command to recover the wasted space.
 /para
+ 
+para
+ With commandCOPY FROM/command the optionLOCK/option is designed
+ to increase performance of data loads when no indexes are present and 
+ the server does not have archive_command set. When those pre-conditions
+ are met, commandCOPY FROM/command will not write WAL records for the
+ data being loaded. This will greatly increase performance for large data
+ loads. An ExclusiveLock is taken to prevent concurrent writes during this 
+ operation. The optionLOCK/option option is available for 
+ commandCOPY TO/command also, but in that case does nothing to increase 
+ performance.
+/para
+ 
+para
+ If the commandCOPY FROM/command command is issued in the same 
+ top level transaction as the command that created the target table, then 
+ performance will also be enhanced, with or without the optionLOCK/option 
+ option. This optimization will greatly enhance the speed of data recovery 
+ from a pg_dump file or software upgrades that required a re-initdb. You can 
+ take advantage of this easily using the option--single-transaction/option
+ option on commandpsql/command or commandpg_restore/command.
+/para
   /refsect1
   
   refsect1

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] COPY LOCK for WAL bypass

2005-12-21 Thread Bruce Momjian
Simon Riggs wrote:
  Right, just disable with a clear error message.
  
 
 Forgive me if my earlier reply was not clear: This patch works just the
 same way CREATE INDEX and CREATE TABLE AS SELECT already works:
 - if the server has archive_command set, WAL is written
 - if archive_command is not set then (assuming other conditions apply)
 the writing WAL will be bypassed for performance.
 There is no error-message generated, by COPY LOCK nor the other two
 previously mentioned commands.

Sounds good.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] COPY LOCK for WAL bypass

2005-12-20 Thread Jim C. Nasby
On Mon, Dec 19, 2005 at 07:51:54AM +, Simon Riggs wrote:
 On Sun, 2005-12-18 at 22:03 -0500, Chris Browne wrote:
  [EMAIL PROTECTED] (Simon Riggs) writes:
   On Sat, 2005-12-10 at 12:07 +, Simon Riggs wrote:
   Following patch implements COPY ... FROM ... LOCK
  
   Patch now updated so that it includes an additional optimization of
   COPY, so that WAL will not be written in the transaction that created
   the table.
  
   This now gives two fast paths for COPY:
   1) COPY LOCK
   2) COPY in same transaction (e.g. reloading a pg_dump)
  
  I presume that if this doesn't go into WAL, that means that this kind
  of update wouldn't play with PITR, right?
 
 You're right.
 
 PITR is designed for normal production use, rather than initial loading.
 It is also fairly common to turn PITR off permanently in larger data
 warehouses, which is also where these optimizations are aimed.

Hrm... I'd say we need an option to disable the fast-copy then, in case
you wanted the copy to make it into PITR. Or perhaps we just disallow
the fast-copy when PITR is in use. I believe that's what other databases
do...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY LOCK for WAL bypass

2005-12-19 Thread Simon Riggs
On Sun, 2005-12-18 at 22:03 -0500, Chris Browne wrote:
 [EMAIL PROTECTED] (Simon Riggs) writes:
  On Sat, 2005-12-10 at 12:07 +, Simon Riggs wrote:
  Following patch implements COPY ... FROM ... LOCK
 
  Patch now updated so that it includes an additional optimization of
  COPY, so that WAL will not be written in the transaction that created
  the table.
 
  This now gives two fast paths for COPY:
  1) COPY LOCK
  2) COPY in same transaction (e.g. reloading a pg_dump)
 
 I presume that if this doesn't go into WAL, that means that this kind
 of update wouldn't play with PITR, right?

You're right.

PITR is designed for normal production use, rather than initial loading.
It is also fairly common to turn PITR off permanently in larger data
warehouses, which is also where these optimizations are aimed.

I'll take the hint and write the docs then. :-)

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] COPY LOCK for WAL bypass

2005-12-18 Thread Simon Riggs
On Sat, 2005-12-10 at 12:07 +, Simon Riggs wrote:
 Following patch implements COPY ... FROM ... LOCK

Patch now updated so that it includes an additional optimization of
COPY, so that WAL will not be written in the transaction that created
the table.

This now gives two fast paths for COPY:
1) COPY LOCK
2) COPY in same transaction (e.g. reloading a pg_dump)

Patch passes make check on cvstip.

No docs yet, but let me know if this is OK and I'll work on them.

[Other copied in from the related patch thread on Single-Transaction
Utility options. With this new COPY optimization the
--single-transaction option will considerably increase performance.]

Performance tests shown on previous post for this thread.

Best Regards, Simon Riggs
Index: src/backend/access/heap/heapam.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.205
diff -c -r1.205 heapam.c
*** src/backend/access/heap/heapam.c	26 Nov 2005 05:03:06 -	1.205
--- src/backend/access/heap/heapam.c	18 Dec 2005 21:10:52 -
***
*** 28,33 
--- 28,34 
   *		heap_update		- replace a tuple in a relation with another tuple
   *		heap_markpos	- mark scan position
   *		heap_restrpos	- restore position to marked location
+  *  heap_sync   - sync heap, for when no WAL has been written
   *
   * NOTES
   *	  This file contains the heap_ routines which implement
***
*** 49,54 
--- 50,56 
  #include miscadmin.h
  #include pgstat.h
  #include storage/procarray.h
+ #include storage/smgr.h
  #include utils/inval.h
  #include utils/relcache.h
  
***
*** 1335,1342 
   * non-temp relation.  Safe usage of this behavior requires that we arrange
   * that all new tuples go into new pages not containing any tuples from other
   * transactions, that the relation gets fsync'd before commit, and that the
!  * transaction emits at least one WAL record to ensure RecordTransactionCommit
!  * will decide to WAL-log the commit.
   *
   * use_fsm is passed directly to RelationGetBufferForTuple, which see for
   * more info.
--- 1337,1345 
   * non-temp relation.  Safe usage of this behavior requires that we arrange
   * that all new tuples go into new pages not containing any tuples from other
   * transactions, that the relation gets fsync'd before commit, and that the
!  * transaction emits at least one WAL record or must mark 
!  * MyXactMustRecordCommitIfAny=true to ensure RecordTransactionCommit will
!  * decide to WAL-log the commit. (see heap_sync() comments also)
   *
   * use_fsm is passed directly to RelationGetBufferForTuple, which see for
   * more info.
***
*** 1413,1419 
  	if (relation-rd_istemp)
  	{
  		/* No XLOG record, but still need to flag that XID exists on disk */
! 		MyXactMadeTempRelUpdate = true;
  	}
  	else if (use_wal)
  	{
--- 1416,1422 
  	if (relation-rd_istemp)
  	{
  		/* No XLOG record, but still need to flag that XID exists on disk */
! 		MyXactMustRecordCommitIfAny = true;
  	}
  	else if (use_wal)
  	{
***
*** 1731,1737 
  	else
  	{
  		/* No XLOG record, but still need to flag that XID exists on disk */
! 		MyXactMadeTempRelUpdate = true;
  	}
  
  	END_CRIT_SECTION();
--- 1734,1740 
  	else
  	{
  		/* No XLOG record, but still need to flag that XID exists on disk */
! 		MyXactMustRecordCommitIfAny = true;
  	}
  
  	END_CRIT_SECTION();
***
*** 2172,2178 
  	else
  	{
  		/* No XLOG record, but still need to flag that XID exists on disk */
! 		MyXactMadeTempRelUpdate = true;
  	}
  
  	END_CRIT_SECTION();
--- 2175,2181 
  	else
  	{
  		/* No XLOG record, but still need to flag that XID exists on disk */
! 		MyXactMustRecordCommitIfAny = true;
  	}
  
  	END_CRIT_SECTION();
***
*** 2674,2680 
  	else
  	{
  		/* No XLOG record, but still need to flag that XID exists on disk */
! 		MyXactMadeTempRelUpdate = true;
  	}
  
  	END_CRIT_SECTION();
--- 2677,2683 
  	else
  	{
  		/* No XLOG record, but still need to flag that XID exists on disk */
! 		MyXactMustRecordCommitIfAny = true;
  	}
  
  	END_CRIT_SECTION();
***
*** 3456,3458 
--- 3459,3490 
  	else
  		strcat(buf, UNKNOWN);
  }
+ 
+ /* 
+  *		heap_sync - sync a heap, for use when no WAL has been written
+  *
+  * 
+  */
+ void
+ heap_sync(Relation rel, bool needToRecordCommit)
+ {
+ /* 
+  * If our transaction has avoided writing WAL up to now, 
+  * we need to force the changes to be written to disk then make 
+  * commit record visible in clog, so other users can view these changes
+  */
+ if (needToRecordCommit)
+ MyXactMustRecordCommitIfAny = true;
+ 
+ if (!rel-rd_istemp)
+ {
+ /*
+  * If we skipped using WAL, and it's not a temp relation,
+  * we must force the relation down to disk before it's
+  * 

Re: [PATCHES] COPY LOCK for WAL bypass

2005-12-18 Thread Chris Browne
[EMAIL PROTECTED] (Simon Riggs) writes:
 On Sat, 2005-12-10 at 12:07 +, Simon Riggs wrote:
 Following patch implements COPY ... FROM ... LOCK

 Patch now updated so that it includes an additional optimization of
 COPY, so that WAL will not be written in the transaction that created
 the table.

 This now gives two fast paths for COPY:
 1) COPY LOCK
 2) COPY in same transaction (e.g. reloading a pg_dump)

I presume that if this doesn't go into WAL, that means that this kind
of update wouldn't play with PITR, right?

That's presumably something that otta be pretty carefully documented
:-).
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/nonrdbms.html
Microsoft Outlook: Deploying Viruses Has Never Been This Easy!

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] COPY LOCK for WAL bypass

2005-12-10 Thread Simon Riggs
Following patch implements COPY ... FROM ... LOCK as discussed earlier
this year on these threads:
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00069.php
http://archives.postgresql.org/pgsql-hackers/2005-06/msg00075.php

The purpose of the new command is to make an explicit request to run
COPY without producing WAL records (i.e. no logging), so as to improve
the performance of data loads. (This is the first of a number of COPY
performance optimizations, discussed on -hackers).

Default COPY is unchanged. 

LOCK option takes an EXCLUSIVE lock (but perhaps that should be a SHARE
ROW EXCLUSIVE?), allowing it to block out CREATE INDEX builds and
VACUUM.

LOCK option will also cause writing of WAL records to be skipped when
XLogArchivingActive() and there are no indexes.

The implementation refactors the code used by CTAS for syncing the data
to disk once command is complete; COPY and CTAS now both use that code.

COPY .. LOCK doesn't write any XLog records as was previously suggested.
My train of thought: After some thought, no other heap-based xlog action
would leave the table in a consistent state after failure. Nobody wants
to see a random single row in the table. I looked into UPDATEing the
last row loaded to generate an xlog rec but it would be difficult to do
that without being horribly kludgy. I looked at adding a new xlog
action, but there is only one slot left for a heap-based xlog action, so
that seemed unwise. I wrote this using an RM_XLOG xlrec, but that
doesn't actually trigger a commit write (I discover). I've simply set a
flag to tell the transaction to record the commit anyway. That was
already there in heapam.c anyway, but just for temp relations; I've
changed the name of the variable to indicate what it does now, across a
number of files.

(It is also arguable that I should implement a WAL record that truncates
the file back down to the starting size, in the event of a failure. I'm
not sure where we were on that; there seem to be comments both in favour
and against that idea. I can see the use for that, so I'll be happy to
add that also, if we agree there is no danger.)

I've added a few lock options onto the copy.sql test script, but not
added (yet) a full suite of testing.

No docs, yet.

Short performance test shown below for 10^6 rows, one col table.
1. Normal COPY 4.5s 11.4s 6.0s 6.1s
2. COPY LOCK 3.0s 2.7s 2.8s 2.7s
with postgresql.conf all default apart from: checkpoint_segments=30

This test was an all in cache test. The improvement is substantial,
but the numbers above are best case, IMHO: I would expect only 10-40%
improvement for larger loads in the general case.

Short output shown below, with checkpoint_segments=3, so timings for the
standard non-LOCK COPY probably include checkpoint time also.

===
postgres=# create table ctest as select
generate_series(1,100)::integer as col1;
SELECT
postgres=# copy ctest to '/usr/local/pgsql/ctest.data';
COPY
postgres=# truncate ctest;
TRUNCATE TABLE
Time: 41.343 ms
postgres=# copy ctest from '/usr/local/pgsql/ctest.data';
COPY
Time: 7111.205 ms
postgres=# truncate ctest;
TRUNCATE TABLE
Time: 23.175 ms
postgres=# copy ctest from '/usr/local/pgsql/ctest.data' lock;
COPY
Time: 2992.482 ms
postgres=# truncate ctest;
TRUNCATE TABLE
Time: 8.306 ms
postgres=# copy ctest from '/usr/local/pgsql/ctest.data';
COPY
Time: 7433.166 ms


Best Regards, Simon Riggs
Index: src/backend/access/heap/heapam.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.205
diff -c -r1.205 heapam.c
*** src/backend/access/heap/heapam.c	26 Nov 2005 05:03:06 -	1.205
--- src/backend/access/heap/heapam.c	10 Dec 2005 10:54:58 -
***
*** 28,33 
--- 28,34 
   *		heap_update		- replace a tuple in a relation with another tuple
   *		heap_markpos	- mark scan position
   *		heap_restrpos	- restore position to marked location
+  *  heap_sync   - sync heap, for when no WAL has been written
   *
   * NOTES
   *	  This file contains the heap_ routines which implement
***
*** 49,54 
--- 50,56 
  #include miscadmin.h
  #include pgstat.h
  #include storage/procarray.h
+ #include storage/smgr.h
  #include utils/inval.h
  #include utils/relcache.h
  
***
*** 1335,1342 
   * non-temp relation.  Safe usage of this behavior requires that we arrange
   * that all new tuples go into new pages not containing any tuples from other
   * transactions, that the relation gets fsync'd before commit, and that the
!  * transaction emits at least one WAL record to ensure RecordTransactionCommit
!  * will decide to WAL-log the commit.
   *
   * use_fsm is passed directly to RelationGetBufferForTuple, which see for
   * more info.
--- 1337,1345 
   * non-temp relation.  Safe usage of this behavior requires that we arrange
   * that all new tuples go into new pages