[PATCHES] Partitioning docs WIP

Simon Riggs Sun, 30 Oct 2005 17:54:14 -0800

I've been working on some docs for Constraining Exclusion & Partitioning
for some time now. Deadlines seem to be looming, or may even have
passed, so it seems sensible to submit what I have now.


It's still a WIP: The final section on queries is not yet complete, but
the overall structure and flow makes sense now.

Many thanks to Josh Berkus for providing the numbered section on
implementation process, which was the starting point I'd been looking
for to describe everything else.

Any comments welcome now... flames expected for lateness.

Best Regards, Simon Riggs

Index: ddl.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/ddl.sgml,v
retrieving revision 1.45
diff -c -c -r1.45 ddl.sgml
*** ddl.sgml	23 Oct 2005 19:29:49 -0000	1.45
--- ddl.sgml	31 Oct 2005 01:38:26 -0000
***************
*** 398,403 ****
--- 398,410 ----
      ensure that a column does not contain null values, the not-null
      constraint described in the next section can be used.
     </para>
+ 
+     <para>
+      Check constraints can also be used to enhance performance with
+      very large tables, when used in conjunction with the 
+      <xref linkend="guc-constraint-exclusion"> parameter. 
+      This is discussed in more detail in <xref linkend="ce-partitioning">
+    </para>
    </sect2>
  
    <sect2>
***************
*** 1040,1052 ****
   <sect1 id="ddl-inherit">
    <title>Inheritance</title>
  
!   <remark>This section needs to be rethought.  Some of the
!   information should go into the following chapters.</remark>
  
    <para>
!    Let's create two tables. The capitals  table  contains
!    state  capitals  which  are also cities. Naturally, the
!    capitals table should inherit from cities.
  
  <programlisting>
  CREATE TABLE cities (
--- 1047,1081 ----
   <sect1 id="ddl-inherit">
    <title>Inheritance</title>
  
!    <indexterm>
!     <primary>not-null constraint</primary>
!    </indexterm>
! 
!    <indexterm>
!     <primary>constraint</primary>
!     <secondary>NOT NULL</secondary>
!    </indexterm>
  
    <para>
!    <productname>PostgreSQL</productname> was the first DBMS to introduce
!    inheritance, as one its object-relational features.
!    <productname>PostgreSQL</productname> implements table
!    inheritance, a useful tool for database designers. The SQL:2003 standard
!    also allows inheritance, but this is type inheritance rather than the
!    features described here.
!   </para>
! 
!   <para>
!    Let's start with an example: 
!    We're trying to build a data model for cities, but we have a problem.
!    Each state has many cities, but only one capital. We want to be able
!    to quickly retrieve the capital city for any particular state. We
!    can solve this problem by creating two tables. The capitals table contains
!    state  capitals, then we have another table for cities that aren't capitals.
!    What happens when we want to ask for data about a city, regardless of
!    whether it is a capital or not? We can use the inheritance feature to
!    help resolve this problem for us. We define the capitals table so that
!    it inherits from cities.
  
  <programlisting>
  CREATE TABLE cities (
***************
*** 1062,1077 ****
  
     In this case, a row of capitals <firstterm>inherits</firstterm> all
     attributes (name, population, and altitude) from its parent, cities.  State
!    capitals have an extra attribute, state, that shows their state.  In
!    <productname>PostgreSQL</productname>, a table can inherit from zero or
     more other tables, and a query can reference either all rows of a table or
     all rows of a table plus all of its descendants.
- 
-    <note>
-     <para>
-      The inheritance hierarchy is actually a directed acyclic graph.
-     </para>
-    </note>
    </para>
  
    <para>
--- 1091,1102 ----
  
     In this case, a row of capitals <firstterm>inherits</firstterm> all
     attributes (name, population, and altitude) from its parent, cities.  State
!    capitals have an extra attribute, state, that shows their state.  
!   </para>
!   <para>
!    In <productname>PostgreSQL</productname>, a table can inherit from zero or
     more other tables, and a query can reference either all rows of a table or
     all rows of a table plus all of its descendants.
    </para>
  
    <para>
***************
*** 1133,1163 ****
    </para>
  </note>
    
!   <note>
!    <title>Deprecated</title> 
!    <para>
!      In previous versions of <productname>PostgreSQL</productname>, the
!      default behavior was not to include child tables in queries. This was
!      found to be error prone and is also in violation of the SQL:2003
!      standard. Under the old syntax, to get the sub-tables you append
!      <literal>*</literal> to the table name.
!      For example
! <programlisting>
! SELECT * from cities*;
! </programlisting>
!      You can still explicitly specify scanning child tables by appending
!      <literal>*</literal>, as well as explicitly specify not scanning child tables by
!      writing <quote>ONLY</quote>.  But beginning in version 7.1, the default
!      behavior for an undecorated table name is to scan its child tables
!      too, whereas before the default was not to do so.  To get the old
!      default behavior, set the configuration option
!      <literal>SQL_Inheritance</literal> to off, e.g.,
! <programlisting>
! SET SQL_Inheritance TO OFF;
! </programlisting>
!      or add a line in your <filename>postgresql.conf</filename> file.
!    </para>
!   </note>
  
    <para>
    In some cases you may wish to know which table a particular row
--- 1158,1180 ----
    </para>
  </note>
    
!   <para>
!    Inheritance does not automatically propogate data from INSERT or COPY 
!    commands to other tables in the inheritance hierarchy. Inheritance
!    does not influence these commands.
!    In our example, the following INSERT statement will fail.
! <programlisting>
! INSERT INTO cities 
! (name, population, altitude, state)
! VALUES ('New York', NULL, NULL, 'NY');
! </programlisting>         
!    We might hope that the data would be somehow routed to the capitals table,
!    though this does not happen. If the child has no locally defined
!    columns, then it is possible to route data from the parent to the child
!    using a RULE, see <xref linkend="rules-update">. 
!    This is not possible with the above INSERT statement
!    because the state column does not exist on both tables.
!   </para>
  
    <para>
    In some cases you may wish to know which table a particular row
***************
*** 1204,1220 ****
    </para>
  
    <para>
!    A table can inherit from more than one parent table, in which case it has
!    the union of the columns defined by the parent tables (plus any columns
!    declared specifically for the child table).
    </para>
  
    <para>
!    A serious limitation of the inheritance feature is that indexes (including
!    unique constraints) and foreign key constraints only apply to single
!    tables, not to their inheritance children.  This is true on both the
!    referencing and referenced sides of a foreign key constraint.  Thus,
!    in the terms of the above example:
  
     <itemizedlist>
      <listitem>
--- 1221,1287 ----
    </para>
  
    <para>
!    As shown above, a child table may locally define columns as 
!    well as inheriting them from their parents. 
!    However, polymorphism of table datatypes is not supported: a locally defined
!    column cannot override the datatype of an inherited column of the same name.
!   </para>
! 
!   <para>
!    A table can inherit from a table that has itself inherited from 
!    other tables. A table can also inherit from more than one parent 
!    table, in which case it has the union of the columns defined by the parent 
!    tables (plus any columns declared locally for the child table). 
!    Inherited columns with duplicate names and datatypes will be merged so
!    that only a single column is stored. An inheritance definition that 
!    results in a column with more than one datatype would not succeed. 
!    The full inheritance hierarchy is actually a directed acyclic graph of
!    arbitrary complexity. You should note that complex inheritance hierarchies
!    will cause a corresponding increase in parsing time for your queries. 
!    Direct access to the specific table your query requires will always
!    provide the best performance.
!   </para>
! 
!   <para>
!    Table inheritance can currently only be defined using the 
!    <xref linkend="SQL-CREATETABLE"> statement.
!    The related statement CREATE TABLE ... AS SELECT does
!    not allow the specification of an inheritance link. The ALTER TABLE statement
!    does not currently support the INHERITS() clause. There is no
!    way to add an inheritance link to make an existing table into a child
!    table. Similarly, there is no way to remove an inheritance link from a 
!    child table, once defined, other than using DROP TABLE on the child table.
!    A parent table cannot be dropped while any of its children remain. If you 
!    wish to remove a table and all of its descendants, then you can do so 
!    using the CASCADE option of the <xref linkend="SQL-DROPTABLE"> statement.
!   </para>
! 
!   <para>
!    Check constraints can be defined on tables within an inheritance
!    hierarchy. All check constraints on a parent table are automatically
!    inherited by all of their children. Note that is currently possible to 
!    inherit mutually exclusive check constraints, but that definition quickly 
!    shows itself since all attempted row inserts will be rejected.
!   </para>
! 
!   <para>
!    <xref linkend="SQL-ALTERTABLE"> will propogate any changes in data 
!    definition on columns or check constraints down the inheritance hierarchy. 
!    Again, dropping columns or constraints on parent tables is only possible 
!    when using the CASCADE option. ALTER TABLE follows the same rules for 
!    duplicate column merging and rejection that apply during CREATE TABLE.
    </para>
  
    <para>
!    Both parent and child tables can have primary and foreign keys
!    defined for them, so that they can take part normally on both the 
!    referencing and referenced sides of a foreign key constraint. Indexes
!    may be defined on any of these columns whether or not they are inherited.
!    However, a serious current limitation of the inheritance feature is that 
!    indexes (including unique constraints) and foreign key constraints only 
!    apply to single tables and do not also index their inheritance children.  
!    This is true on both sides of a foreign key constraint.  
!    Thus, in the terms of the above example:
  
     <itemizedlist>
      <listitem>
***************
*** 1236,1244 ****
        Similarly, if we were to specify that
        <structname>cities</>.<structfield>name</> <literal>REFERENCES</> some
        other table, this constraint would not automatically propagate to
!       <structname>capitals</>.  In this case you could work around it by
!       manually adding the same <literal>REFERENCES</> constraint to
!       <structname>capitals</>.
       </para>
      </listitem>
  
--- 1303,1313 ----
        Similarly, if we were to specify that
        <structname>cities</>.<structfield>name</> <literal>REFERENCES</> some
        other table, this constraint would not automatically propagate to
!       <structname>capitals</>.  However, it is possible to set up a 
!       foreign key such as <structname>capitals</>.<structfield>name</> 
!       <literal>REFERENCES</> <structname>states</>.<structfield>name</>. 
!       So it is possible to workaround this restriction by manually adding
!       foreign keys to each child table.
       </para>
      </listitem>
  
***************
*** 1255,1260 ****
--- 1324,1722 ----
     but in the meantime considerable care is needed in deciding whether
     inheritance is useful for your problem.
    </para>
+   <note>
+    <title>Deprecated</title> 
+    <para>
+      In previous versions of <productname>PostgreSQL</productname>, the
+      default behavior was not to include child tables in queries. This was
+      found to be error prone and is also in violation of the SQL:2003
+      standard. Under the old syntax, to get the sub-tables you append
+      <literal>*</literal> to the table name.
+      For example
+ <programlisting>
+ SELECT * from cities*;
+ </programlisting>
+      You can still explicitly specify scanning child tables by appending
+      <literal>*</literal>, as well as explicitly specify not scanning child tables by
+      writing <quote>ONLY</quote>.  But beginning in version 7.1, the default
+      behavior for an undecorated table name is to scan its child tables
+      too, whereas before the default was not to do so.  To get the old
+      default behavior, set the configuration option
+      <literal>SQL_Inheritance</literal> to off, e.g.,
+ <programlisting>
+ SET SQL_Inheritance TO OFF;
+ </programlisting>
+      or add a line in your <filename>postgresql.conf</filename> file.
+    </para>
+   </note>
+ 
+   </sect1>
+ 
+   <sect1 id="ce-partitioning">
+    <title>Constraint Exclusion and Partitioning</title>
+ 
+    <indexterm>
+     <primary>partitioning</primary>
+    </indexterm>
+ 
+    <indexterm>
+     <primary>constraint exclusion</primary>
+    </indexterm>
+ 
+    <para>
+     <productname>PostgreSQL</productname> provides features required
+     to support basic table partitioning. This section describes why and how
+     you implement this as part of your database design. 
+    </para>
+ 
+    <para>
+     In &version; the following ways of partitioning are supported:
+    <itemizedlist>
+     <listitem>
+      <para>
+       "Range Partitioning" where the table is partitioned along a "range" 
+       defined by a single column or set of columns, with no overlap between 
+       partitions. Examples might be a date range (e.g. LogDate) or a range of
+       identifiers for particular business objects (e.g. OrderId).
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       "List Partitioning" where the table is partitioned by explicitly listing
+       which values relate to each partition. 
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       Mixed range and list partitioning. No restrictions are placed on the 
+       number or complexity of the partitioning constraints.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       Hash partitioning is not currently supported.
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+ 
+    <para>
+     Currently, partitioning is implemented in conjunction with table
+     inheritance only, though using fully SQL:2003 compliant syntax. 
+     Table inheritance gives us the ability to split up a table into partitions
+     and the Constraint Exclusion feature provides the ability to selectively
+     bring partitions back together in response to particular SQL. 
+     You should be familiar with Inheritance (see <xref linkend="ddl-inherit">)
+     before attempting to implement partitioning.
+    </para>
+ 
+ 
+    <sect2 id="ce-partitioning-implementation">
+      <title>Implementing Partitioning</title>
+ 
+     <para>
+      Partitioning a table is a straightforward process.  The
+      reasons and ways to use partitioning are a rather more complex topic 
+      and will be addressed primarily through the examples in this section.
+     </para>
+  
+     <para>
+      <orderedlist spacing=compact>
+       <listitem>
+        <para>
+         Create a the "master" table, from which all of the partitions will
+         inherit. 
+        </para>
+        <para>
+         This table will contain no data.  Do not define any 
+         constraints or keys on this table at this time, unless you intend them
+         to be applied equally to all partitions.
+        </para>
+       </listitem>
+ 
+       <listitem>
+        <para>
+         Create several "child" tables using table inheritance.  These child
+         tables are the partitions. It is advisable to devise a progressive naming
+         scheme for the partitions.
+        </para>
+       </listitem>
+ 
+       <listitem>
+        <para>
+         Add table constraints to define the allowed values in each partition.
+        </para>
+        <para>
+         Only clauses of the form [COLUMN] [OPERATOR] [CONSTANT(s)] will be used
+         for constraint exclusion. Simple examples would be:
+ <programlisting>
+ CHECK ( x = 1 )
+ CHECK ( county IN ('Oxfordshire','Buckinghamshire','Warwickshire'))
+ CHECK ( outletID BETWEEN 1 AND 99 )
+ </programlisting>
+ 
+         These can be linked together with boolean operators AND and OR to 
+         form complex constraints. Note that there is no difference in syntax
+         between Range and List Partitioning mechanisms; those terms are
+         descriptive only. Ensure that the set of values in each child table 
+         do not overlap.
+        </para>
+       </listitem>
+ 
+       <listitem>
+        <para>
+         Add any other indexes you want on the partitions, bearing in mind
+         that it is always more efficient to add them after data has been
+         loaded into the partition.
+        </para>
+       </listitem>
+ 
+       <listitem>
+        <para>
+         (optional) Define a RULE or trigger which tells the database which
+         partition in which to put new data.
+        </para>
+       </listitem>
+ 
+      </orderedlist>
+     </para>
+ 
+     <para>
+      Note that partitions are normal PostgreSQL tables, so all the normal 
+      conditions apply as specified in the rest of this manual.
+     </para>
+ 
+     <para>
+      Now lets look at the process again using a worked example of a database
+      for an ice cream company. The company measures peak temperatures every day
+      as well as ice cream sales in each region. They have two tables:
+   
+ <programlisting>
+ CREATE TABLE cities (
+     id              int not null,
+     name            text not null,
+     altitude        int     -- (in ft)
+ );
+ 
+ CREATE TABLE measurement (
+     city_id         int not null
+     logdate         date not null,
+     peaktemp        int
+     unitsales       int
+ );
+ </programlisting>
+ 
+      As time goes on, sales data is entered. Storing data costs money, 
+      yet older data has less value as ice cream fashions change, so we decide
+      to keep only 3 years data on a rolling 36 months. Each month we remove 
+      the oldest month's data.
+     </para>
+ 
+     <para>
+      Most queries just access the last week, month or quarter's data,
+      since we need to keep track of sales. As a result we have a large table,
+      yet only the most frequent 10% is accessed. Most of these queries
+      are online reports for various levels of management. These queries access
+      much of the table, so it is difficult to build enough indexes and at
+      the same time allow us to keep loading all of the data fast enough. 
+      Yet, the reports are online so we need to respond quickly.
+     </para>
+ 
+     <para>
+      In this situation we can use partitioning to help us optimize the design 
+      to cope with the demands placed upon the measurement table.
+     </para>
+ 
+     <para>
+      (Step 1) The measurement table is our master table. 
+     </para>
+ 
+     <para>
+      (Step 2) Next we create 36 month-size partitions, using the PostgreSQL
+      inheritance feature like this:
+     
+ <programlisting>
+ CREATE TABLE measurement_yy04mm02 ( ) INHERITS (measurement);
+ CREATE TABLE measurement_yy04mm03 ( ) INHERITS (measurement);
+ ...
+ CREATE TABLE measurement_yy05mm11 ( ) INHERITS (measurement);
+ CREATE TABLE measurement_yy05mm12 ( ) INHERITS (measurement);
+ CREATE TABLE measurement_yy06mm01 ( ) INHERITS (measurement);
+ </programlisting>
+ 
+      Each of the tables are complete tables in their own right, but they 
+      inherit their definition from the measurement table.
+     </para>
+ 
+     <para>
+      This solves one of our problems: deleting older data. Each month, all
+      we need to do is perform a DROP TABLE on the oldest table and create
+      a new table to insert into. 
+     </para>
+ 
+     <para>
+      (Step 3) We now add non-overlapping table constraints, so that our table
+      creation script becomes:
+ 
+  <programlisting>
+ CREATE TABLE measurement_yy04mm02 ( 
+     CHECK ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
+                                     ) INHERITS (measurement);
+ CREATE TABLE measurement_yy04mm03 ( 
+     CHECK ( logdate >= DATE '2004-03-01' AND logdate < DATE '2004-04-01' )
+                                     ) INHERITS (measurement);
+ ...
+ CREATE TABLE measurement_yy05mm11 ( 
+     CHECK ( logdate >= DATE '2005-11-01' AND logdate < DATE '2005-12-01' )
+                                     ) INHERITS (measurement);
+ CREATE TABLE measurement_yy05mm12 ( 
+     CHECK ( logdate >= DATE '2005-12-01' AND logdate < DATE '2006-01-01' )
+                                     ) INHERITS (measurement);
+ CREATE TABLE measurement_yy06mm01 ( 
+     CHECK ( logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01' )
+                                     ) INHERITS (measurement);
+ </programlisting>
+ 
+      Note that we use constants in the CHECK constraints rather than a 
+      calculation involving INTERVALS. 
+     </para>
+ 
+     <para>
+      (Step 4) We choose not to add further indices at this time.
+     </para>
+ 
+     <para>
+      (Step 5) Data will be added each day to the latest partition. This allows
+      us to set up a very simple RULE to insert data, which we change once per
+      month so that it always points to the current partition.
+ 
+ <programlisting>
+ CREATE RULE measurement_current_partition AS 
+ ON INSERT
+ TO MEASUREMENT
+ DO INSTEAD 
+     INSERT INTO measurement_yy05mm11 VALUES ( NEW.city_id
+                                              ,NEW.logdate
+                                              ,NEW.peaktemp
+                                              ,NEW.unitsales );
+ </programlisting>
+ 
+     We might want to insert data and have the server automatically 
+     locate the partition into which the row should be added. We could do this
+     with a more complex set of RULEs as shown below.
+ 
+ <programlisting>
+ CREATE RULE measurement_insert_yy04mm02 AS 
+ ON INSERT
+ TO MEASUREMENT WHERE
+     ( logdate >= DATE '2004-02-01' AND logdate < DATE '2004-03-01' )
+ DO INSTEAD 
+     INSERT INTO measurement_yy04mm02 VALUES ( NEW.city_id
+                                              ,NEW.logdate
+                                              ,NEW.peaktemp
+                                              ,NEW.unitsales );
+ CREATE RULE measurement_insert_yy04mm03 AS 
+ ON INSERT
+ TO MEASUREMENT WHERE
+     ( logdate >= DATE '2004-03-01' AND logdate < DATE '2004-04-01' )
+ DO INSTEAD 
+     INSERT INTO measurement_yy04mm03 VALUES ( NEW.city_id
+                                              ,NEW.logdate
+                                              ,NEW.peaktemp
+                                              ,NEW.unitsales );
+ ...
+ CREATE RULE measurement_insert_yy06mm01 AS 
+ ON INSERT
+ TO MEASUREMENT WHERE
+     ( logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01' )
+ DO INSTEAD 
+     INSERT INTO measurement_yy06mm01 VALUES ( NEW.city_id
+                                              ,NEW.logdate
+                                              ,NEW.peaktemp
+                                              ,NEW.unitsales );
+ </programlisting>
+ 
+      Note that the WHERE clauses in each RULE exactly match those used
+      for the CHECK constraints on each partition. 
+     </para>
+ 
+     <para>
+      As we can see, a complex partitioning scheme could require a substantial
+      amount of DDL. In the above example we would be creating a new partition
+      each month, so it would be advisable to create a script/program that 
+      generates the SQL DDL required automatically rather than manually
+      maintaining all of the required code.
+     </para>
+ 
+    <para>
+     The following caveats apply:
+    <itemizedlist>
+     <listitem>
+      <para>
+       There is currently no way to specify that all of the CHECK constraints
+       are mutually exclusive. Care is required by the database designer.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       There is currently no way to specify that rows may not be inserted
+       into the master table. If you define a CHECK constraint on the master
+       table, then this constraint will be inherited by all child tables. All
+       queries will then return no rows.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+       For some datatypes you must explicitly coerce the constant values
+       into the datatype of the column. The following constraint will 
+       work if x is an INTEGER datatype, but not if x is BIGINT datatype.
+ <programlisting>
+ CHECK ( x = 1)
+ </programlisting>
+       For BIGINT we must use a constraint like:
+ <programlisting>
+ CHECK ( x = 1::bigint)
+ </programlisting>
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+ 
+ 
+    </sect2>
+ 
+    <sect2 id="ce-constraint-exclusion-query">
+      <title>Constraint Exclusion in Queries</title>
+ 
+    <para>
+     The following caveats apply:
+    <itemizedlist>
+     <listitem>
+      <para>
+       UPDATE/DELETE 
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>    
+      </para>
+     </listitem>
+    </itemizedlist>
+    </para>
+ 
+ 
+    </sect2>
+ 
   </sect1>
  
   <sect1 id="ddl-alter">

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

[PATCHES] Partitioning docs WIP

Reply via email to