Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-04 Thread Bruce Momjian

Patch applied.

---

Bruce Momjian wrote:
 Alvaro Herrera wrote:
  Excerpts from Bruce Momjian's message of mar ago 02 22:46:55 -0400 2011:
  
   I have created a documentation patch to clarify this, and to mention
   CREATE TABLE AS which also has this optimization.
  
  It doesn't seem particularly better to me.  How about something like
  
  In minimal level, WAL-logging of some operations can be safely skipped,
  which can make those operations much faster (see blah).  Operations on
  which this optimization can be applied include: 
  simplelist
   itemCREATE INDEX/item
   itemCLUSTER/item
   itemCREATE TABLE AS/item
   itemCOPY, when tables that were created or truncated in the same
   transaction
  /simplelist
  
  Minimal WAL does not contain enough information to reconstruct the data
  from a base backup and the WAL logs, so either literalarchive/ or
  literalhot_standby/ level must be used to enable ...
 
 Good idea --- updated patch attached.
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +


 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-03 Thread Bruce Momjian
Alvaro Herrera wrote:
 Excerpts from Bruce Momjian's message of mar ago 02 22:46:55 -0400 2011:
 
  I have created a documentation patch to clarify this, and to mention
  CREATE TABLE AS which also has this optimization.
 
 It doesn't seem particularly better to me.  How about something like
 
 In minimal level, WAL-logging of some operations can be safely skipped,
 which can make those operations much faster (see blah).  Operations on
 which this optimization can be applied include: 
 simplelist
  itemCREATE INDEX/item
  itemCLUSTER/item
  itemCREATE TABLE AS/item
  itemCOPY, when tables that were created or truncated in the same
  transaction
 /simplelist
 
 Minimal WAL does not contain enough information to reconstruct the data
 from a base backup and the WAL logs, so either literalarchive/ or
 literalhot_standby/ level must be used to enable ...

Good idea --- updated patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 4fadca9..aac6c3b
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** SET ENABLE_SEQSCAN TO OFF;
*** 1451,1461 
  This parameter can only be set at server start.
 /para
 para
! In literalminimal/ level, WAL-logging of some bulk operations, like
! commandCREATE INDEX/, commandCLUSTER/ and commandCOPY/ on
! a table that was created or truncated in the same transaction can be
! safely skipped, which can make those operations much faster (see
! xref linkend=populate-pitr). But minimal WAL does not contain
  enough information to reconstruct the data from a base backup and the
  WAL logs, so either literalarchive/ or literalhot_standby/
  level must be used to enable
--- 1451,1468 
  This parameter can only be set at server start.
 /para
 para
! In literalminimal/ level, WAL-logging of some bulk
! operations can be safely skipped, which can make those
! operations much faster (see xref linkend=populate-pitr).
! Operations in which this optimization can be applied include:
! simplelist
!  itemCREATE INDEX/item
!  itemCLUSTER/item
!  itemCREATE TABLE AS/item
!  itemCOPY into tables that were created or truncated in the same
!  transaction
! /simplelist
! But minimal WAL does not contain
  enough information to reconstruct the data from a base backup and the
  WAL logs, so either literalarchive/ or literalhot_standby/
  level must be used to enable

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WAL logging volume and CREATE TABLE

2011-08-02 Thread Bruce Momjian
Our docs suggest an optimization to reduce WAL logging when you are
creating and populating a table:


http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS

In minimal level, WAL-logging of some bulk operations, like CREATE
INDEX, CLUSTER and COPY on a table that was created or truncated in the
same transaction can be safely skipped, which can make those operations
much faster (see Section 14.4.7). But minimal WAL does not contain
enough information to reconstruct the data from a base backup and the
WAL logs, so either archive or hot_standby level must be used to enable
WAL archiving (archive_mode) and streaming replication.

I am confused why we issue significant WAL traffic for CREATE INDEX? 
Isn't the index either created or removed if the transaction fails? 
What crash recovery activity state do we need WAL logging for?  I
realize we have to do WAL logging for streaming replication, but CREATE
TABLE isn't going to affect that.   I also realize the index has to be
on disk on commit, but the same is true for doing the CREATE TABLE in
the same transaction block.

Does this optimization work for INSERT ... SELECT? Is this optimization
automatic for CREATE TABLE AS (SELECT INTO)?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-02 Thread Merlin Moncure
On Tue, Aug 2, 2011 at 8:34 AM, Bruce Momjian br...@momjian.us wrote:
 Our docs suggest an optimization to reduce WAL logging when you are
 creating and populating a table:

        
 http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS

        In minimal level, WAL-logging of some bulk operations, like CREATE
        INDEX, CLUSTER and COPY on a table that was created or truncated in the
        same transaction can be safely skipped, which can make those operations
        much faster (see Section 14.4.7). But minimal WAL does not contain
        enough information to reconstruct the data from a base backup and the
        WAL logs, so either archive or hot_standby level must be used to enable
        WAL archiving (archive_mode) and streaming replication.

 I am confused why we issue significant WAL traffic for CREATE INDEX?
 Isn't the index either created or removed if the transaction fails?
 What crash recovery activity state do we need WAL logging for?  I
 realize we have to do WAL logging for streaming replication, but CREATE
 TABLE isn't going to affect that.   I also realize the index has to be
 on disk on commit, but the same is true for doing the CREATE TABLE in
 the same transaction block.

 Does this optimization work for INSERT ... SELECT?

I don't think so -- insert/select doesn't take a full table lock and
it writes to the heap.  The optimization only works when other
backends will never see/touch the data being written out until it is
finished and it doesn't matter if the data is scrambled due to a
crash.  CREATE INDEX might work though.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-02 Thread Heikki Linnakangas

On 02.08.2011 16:34, Bruce Momjian wrote:

Our docs suggest an optimization to reduce WAL logging when you are
creating and populating a table:


http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS

In minimal level, WAL-logging of some bulk operations, like CREATE
INDEX, CLUSTER and COPY on a table that was created or truncated in the
same transaction can be safely skipped, which can make those operations
much faster (see Section 14.4.7). But minimal WAL does not contain
enough information to reconstruct the data from a base backup and the
WAL logs, so either archive or hot_standby level must be used to enable
WAL archiving (archive_mode) and streaming replication.

I am confused why we issue significant WAL traffic for CREATE INDEX?
Isn't the index either created or removed if the transaction fails?
What crash recovery activity state do we need WAL logging for?  I
realize we have to do WAL logging for streaming replication, but CREATE
TABLE isn't going to affect that.   I also realize the index has to be
on disk on commit, but the same is true for doing the CREATE TABLE in
the same transaction block.


I'm confused about what you're confused about. Crash recovery doesn't 
need the WAL for CREATE INDEX, but WAL archiving does.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-02 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Our docs suggest an optimization to reduce WAL logging when you are
 creating and populating a table:

   
 http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS

   In minimal level, WAL-logging of some bulk operations, like CREATE
   INDEX, CLUSTER and COPY on a table that was created or truncated in the
   same transaction can be safely skipped, which can make those operations
   much faster (see Section 14.4.7). But minimal WAL does not contain
   enough information to reconstruct the data from a base backup and the
   WAL logs, so either archive or hot_standby level must be used to enable
   WAL archiving (archive_mode) and streaming replication.

 I am confused why we issue significant WAL traffic for CREATE INDEX? 

The point is that in minimal level we *don't*.  We just fsync the index
file before committing.  In higher levels we have to write the whole
index contents to the WAL, not only the disk file, so that the info
reaches the archive or standby slaves.

Same for the other cases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-02 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Our docs suggest an optimization to reduce WAL logging when you are
  creating and populating a table:
 
  
  http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
   
  In minimal level, WAL-logging of some bulk operations, like CREATE
  INDEX, CLUSTER and COPY on a table that was created or truncated in the
  same transaction can be safely skipped, which can make those operations
  much faster (see Section 14.4.7). But minimal WAL does not contain
  enough information to reconstruct the data from a base backup and the
  WAL logs, so either archive or hot_standby level must be used to enable
  WAL archiving (archive_mode) and streaming replication.
 
  I am confused why we issue significant WAL traffic for CREATE INDEX? 
 
 The point is that in minimal level we *don't*.  We just fsync the index
 file before committing.  In higher levels we have to write the whole
 index contents to the WAL, not only the disk file, so that the info
 reaches the archive or standby slaves.
 
 Same for the other cases.

I realize the need for WAL logging CREATE INDEX for non-'minimal'
wal_level values.

But the documentation states the WAL logging is reduced for CREATE INDEX
by doing CREATE TABLE in the same transaction block.  Why is this true?
Why would the CREATE TABLE affect the CREATE INDEX WAL volume?

I am wondering if the documention is correct about CLUSTER and COPY, but
incorrect for CREATE INDEX.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-02 Thread Bruce Momjian
Merlin Moncure wrote:
 On Tue, Aug 2, 2011 at 8:34 AM, Bruce Momjian br...@momjian.us wrote:
  Our docs suggest an optimization to reduce WAL logging when you are
  creating and populating a table:
 
  ? ? ? 
  ?http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
 
  ? ? ? ?In minimal level, WAL-logging of some bulk operations, like CREATE
  ? ? ? ?INDEX, CLUSTER and COPY on a table that was created or truncated in 
  the
  ? ? ? ?same transaction can be safely skipped, which can make those 
  operations
  ? ? ? ?much faster (see Section 14.4.7). But minimal WAL does not contain
  ? ? ? ?enough information to reconstruct the data from a base backup and the
  ? ? ? ?WAL logs, so either archive or hot_standby level must be used to 
  enable
  ? ? ? ?WAL archiving (archive_mode) and streaming replication.
 
  I am confused why we issue significant WAL traffic for CREATE INDEX?
  Isn't the index either created or removed if the transaction fails?
  What crash recovery activity state do we need WAL logging for? ?I
  realize we have to do WAL logging for streaming replication, but CREATE
  TABLE isn't going to affect that. ? I also realize the index has to be
  on disk on commit, but the same is true for doing the CREATE TABLE in
  the same transaction block.
 
  Does this optimization work for INSERT ... SELECT?
 
 I don't think so -- insert/select doesn't take a full table lock and
 it writes to the heap.  The optimization only works when other

My question is whether INSERT ... SELECT is/could be optimized when the
CREATE TABLE happens in the same transaction block.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-02 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 In minimal level, WAL-logging of some bulk operations, like CREATE
 INDEX, CLUSTER and COPY on a table that was created or truncated in the
 same transaction can be safely skipped, which can make those operations
 much faster (see Section 14.4.7).

 But the documentation states the WAL logging is reduced for CREATE INDEX
 by doing CREATE TABLE in the same transaction block.  Why is this true?

It's not true, and it doesn't say that, or at least doesn't intend to
say that.  That sentence is meant to be read as:

1. The optimization applies to CREATE INDEX.
2. The optimization applies to CLUSTER or COPY on a table that was
created or truncated in the current transaction.

I now see your point, which is that the sentence is easily misparsed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-02 Thread Robert Haas
On Tue, Aug 2, 2011 at 11:30 AM, Bruce Momjian br...@momjian.us wrote:
 Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Our docs suggest an optimization to reduce WAL logging when you are
  creating and populating a table:

      
  http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS

      In minimal level, WAL-logging of some bulk operations, like CREATE
      INDEX, CLUSTER and COPY on a table that was created or truncated in the
      same transaction can be safely skipped, which can make those operations
      much faster (see Section 14.4.7). But minimal WAL does not contain
      enough information to reconstruct the data from a base backup and the
      WAL logs, so either archive or hot_standby level must be used to enable
      WAL archiving (archive_mode) and streaming replication.

  I am confused why we issue significant WAL traffic for CREATE INDEX?

 The point is that in minimal level we *don't*.  We just fsync the index
 file before committing.  In higher levels we have to write the whole
 index contents to the WAL, not only the disk file, so that the info
 reaches the archive or standby slaves.

 Same for the other cases.

 I realize the need for WAL logging CREATE INDEX for non-'minimal'
 wal_level values.

 But the documentation states the WAL logging is reduced for CREATE INDEX
 by doing CREATE TABLE in the same transaction block.  Why is this true?
 Why would the CREATE TABLE affect the CREATE INDEX WAL volume?

 I am wondering if the documention is correct about CLUSTER and COPY, but
 incorrect for CREATE INDEX.

I think the problem here might be ambiguous wording.  I believe that
the modifier on a table that was created or truncated in the same
transaction is intended to apply only to COPY, but the way it's
written, someone (such as you) might be forgiven for thinking that it
applied to the larger phrase CREATE INDEX, CLUSTER, or COPY.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-02 Thread Bruce Momjian
Robert Haas wrote:
 On Tue, Aug 2, 2011 at 11:30 AM, Bruce Momjian br...@momjian.us wrote:
  Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   Our docs suggest an optimization to reduce WAL logging when you are
   creating and populating a table:
 
   ? ? 
   http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
 
   ? ? In minimal level, WAL-logging of some bulk operations, like CREATE
   ? ? INDEX, CLUSTER and COPY on a table that was created or truncated in 
   the
   ? ? same transaction can be safely skipped, which can make those 
   operations
   ? ? much faster (see Section 14.4.7). But minimal WAL does not contain
   ? ? enough information to reconstruct the data from a base backup and the
   ? ? WAL logs, so either archive or hot_standby level must be used to 
   enable
   ? ? WAL archiving (archive_mode) and streaming replication.
 
   I am confused why we issue significant WAL traffic for CREATE INDEX?
 
  The point is that in minimal level we *don't*. ?We just fsync the index
  file before committing. ?In higher levels we have to write the whole
  index contents to the WAL, not only the disk file, so that the info
  reaches the archive or standby slaves.
 
  Same for the other cases.
 
  I realize the need for WAL logging CREATE INDEX for non-'minimal'
  wal_level values.
 
  But the documentation states the WAL logging is reduced for CREATE INDEX
  by doing CREATE TABLE in the same transaction block. ?Why is this true?
  Why would the CREATE TABLE affect the CREATE INDEX WAL volume?
 
  I am wondering if the documention is correct about CLUSTER and COPY, but
  incorrect for CREATE INDEX.
 
 I think the problem here might be ambiguous wording.  I believe that
 the modifier on a table that was created or truncated in the same
 transaction is intended to apply only to COPY, but the way it's
 written, someone (such as you) might be forgiven for thinking that it
 applied to the larger phrase CREATE INDEX, CLUSTER, or COPY.

I have created a documentation patch to clarify this, and to mention
CREATE TABLE AS which also has this optimization.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 4fadca9..a1f51ec
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** SET ENABLE_SEQSCAN TO OFF;
*** 1452,1461 
 /para
 para
  In literalminimal/ level, WAL-logging of some bulk operations, like
! commandCREATE INDEX/, commandCLUSTER/ and commandCOPY/ on
! a table that was created or truncated in the same transaction can be
! safely skipped, which can make those operations much faster (see
! xref linkend=populate-pitr). But minimal WAL does not contain
  enough information to reconstruct the data from a base backup and the
  WAL logs, so either literalarchive/ or literalhot_standby/
  level must be used to enable
--- 1452,1463 
 /para
 para
  In literalminimal/ level, WAL-logging of some bulk operations, like
! commandCREATE INDEX/, commandCLUSTER/, and commandCREATE
! TABLE AS/, can be safely skipped, which can make those
! operations much faster (see xref linkend=populate-pitr).
! In minimal WAL-logging mode, it is also possible to skip WAL-logging of
! and commandCOPY/ operations on tables that were created
! or truncated in the same transaction.  But minimal WAL does not contain
  enough information to reconstruct the data from a base backup and the
  WAL logs, so either literalarchive/ or literalhot_standby/
  level must be used to enable

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging volume and CREATE TABLE

2011-08-02 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of mar ago 02 22:46:55 -0400 2011:

 I have created a documentation patch to clarify this, and to mention
 CREATE TABLE AS which also has this optimization.

It doesn't seem particularly better to me.  How about something like

In minimal level, WAL-logging of some operations can be safely skipped,
which can make those operations much faster (see blah).  Operations on
which this optimization can be applied include: 
simplelist
 itemCREATE INDEX/item
 itemCLUSTER/item
 itemCREATE TABLE AS/item
 itemCOPY, when tables that were created or truncated in the same
 transaction
/simplelist

Minimal WAL does not contain enough information to reconstruct the data
from a base backup and the WAL logs, so either literalarchive/ or
literalhot_standby/ level must be used to enable ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers