[HACKERS] Replication documentation addition

Bruce Momjian Mon, 23 Oct 2006 20:40:11 -0700

Here is my first draft of a new replication section for our
documentation.  I am looking for any comments.


---------------------------------------------------------------------------

Replication
===========

Database replication allows multiple computers to work together, making
them appear as a single computer to user applications.  This might
involve allowing a backup server to take over if the primary server
fails, or it might involve allowing several computers to work together
at the same time.

It would be ideal if database servers could be combined seamlessly.  Web
servers serving static web pages can be combined quite easily by merely
load-balancing web requests to multiple machines.  In fact, most
read-only servers can be combined relatively easily.

Unfortunately, most database servers have a read/write mix of requests,
and read/write servers are much harder to combine.  This is because
though read-only data has to be placed on each each server only once, a
write to any server has to be seen by all other servers so that future
read requests to those servers return consistent results.  

This "sync problem" is the fundamental difficulty of doing database
replication.  Because there is no single solution that limits the impact
of the sync problem for all workloads, there are multiple replication
solutions.  Each solution addresses the sync problem in a different way,
and minimizes its impact for a specific workload.  

This section first outlines two important replication capabilities, and
then outlines various replication solutions.

Sychronous vs. Asynchronous Replication
---------------------------------------

The term sychronous replication means that a query is not considered
committed unless all servers have access to the committed records.  In
that case, a failover to a backup server will lose no data records. 
Asynchronous replication has a small delay between the time of commit
and its propogation to backup servers, opening the possibility that some
transactions might be lost in a switch to a backup server.  Asynchronous
is used when sychronous replication would be too slow.

Full vs. Partial Replication
----------------------------

The term full replication means only a full database cluster can be
replicated, while partial replication means more fine-grained control
over replicated objects is possible.

Shared Disk Failover
-------------------- 

This replication solution avoids the sync problem by having only one
copy of the database.  This is possible because a single disk array is
shared by multiple servers.  If the main database server fails, the
backup server is able to mount and start the database as though it was
restarting after a database crash.  This shared hardware functionality
is common in network storage devices.  This allows sychronous, full
replication.

Warm Standby Using Point-In-Time Recovery
-----------------------------------------

A warm standby server (add doc xref) can be kept current by reading a
stream of WAL records.  If the main server fails, the warm standby
contains almost all of the data as the main server, and can be used as
the new database server.  This allows asychronous, full replication.

Point-In-Time Recovery  [Asychronous, Full]
----------------------

A Point-In-Time Recovery is the same as a Warm Standby server except
that the standby server must go though a full restore and archive
recovery operation, delaying how quickly it can be used as the main
database server.  This allows asychronous, full replication.

Continuously Running Failover Server
------------------------------------

A continuously running failover server allows the backup server to
answer read-only queries while the master server is running.  It
receives a continuous stream of write activity from the master server. 
Because the failover server can be used for read-only database requests,
it is ideal for data warehouse queries. Slony offers this as
asychronous, partial replication.

Data Partitioning
-----------------

Data partitioning partitions the database into data sets.  To achieve
replication, each data set can only be modified by one server.  For
example, data can be partitioned by main office, e.g. London and Paris. 
While London and Paris servers have all data records, only London can
modify London records, and Paris can only modify Paris records.  Such
partitioning is usually accomplished in application code, though rules
and triggers can help enforce such partitioning and keep the read-only
data sets current.  Slony can also be used in such a setup.  While Slony
replicates only entire tables, London and Paris can be placed in
separate tables, and inheritance can be used to pull from both tables at
the same time.

Query Broadcast Replication
---------------------------

This involves sending write queries to multiple servers.  Read-only
queries can be sent to a single server because there is no need for all
servers to process it.   This can be complex to setup because functions
like random() and CURRENT_TIMESTAMP will have different values on
different servers, and sequences should be consistent across servers.
Pgpool implements this type of replication.

Multi-Master Replication
------------------------

In multi-master replication, each server can accept write requests, and
these write requests are broadcast to all other servers before the
transaction commits.  Under heavy load, this type of replication can
cause excessive locking and performance degradation.  It is implemented
by Oracle in their RAC product.  PostgreSQL does not offer this type of
replication, though PostgreSQL two-phase commit can be used to implement
this in application code.

Performance
-----------
Performance must be considered in any repliacation choice.  There is
usually a tradeoff between functionality and performance.  For example,
full sychronousreplication over a slow network might cut performance by
more than half, while asynchronous replication might have a minimal
performance imact.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

[HACKERS] Replication documentation addition

Reply via email to