Hi
I've updated (or rather rewritten) my proposal for implementing
read-onyl queries on PITR slaves as a Summer of Code 2007 project.
I've added are more details description of how I plan implement
a read-only mode suitable for PITR slaves, and put in a few
possible enhancements to the Big, Global R/W lock idea for
serializing WAL replay and queries.
I'm looking forward to any kind of suggestions, ideas, or
critism - I'd like my proposal to be as detailed as
possible before I submit it to SoC, so that if
I get a chance to work on it, I can be reasonable sure
that people here are happy with the way I approach the problem.
greetings, Florian Pflug
Implementing support for read-only queries on PITR slaves
=
Submitter: Florian Pflug [EMAIL PROTECTED]
Abstract:
-
The support for PITR (Point-In-Time-Recovery) in postgres can be used to built
a simple form a master-slave replication. Currently, no queries can be
executed on the slave, though - it only replays WAL (Write-Ahead-Log) segments
it receives from the master. I want to implement support for running read-onyl
queries on such a PITR slave, making PITR usefull not only for disaster
recovery, but also for loadbalancing.
Course overview of the proposed implementation:
---
Currently, postgres does WAL replay soley during the startup of the database,
before all subsystems are fully initialized, and before backend are allowed to
connect. To support read-only queries on PITR slaves, while still guaranteeing
that the database is in a consistens state, the WAL replay will be split into
two parts. The first will replay only enough wal to guarantee a consistens
state, and will run during startup. If read-only mode is disabled, the next
step will be run immediatly after the first. If, however, read-only mode is
enabled, then the database will be brought online in read-only mode after
completing recovery, and the second step will be lauched as a seperate
process. Clients are allowed to connect, and to execute read-only queries as
soon as the database is online, even though WAL replay is still being done in
the background.
Implementation of a read-only mode suitable for PITR slaves
---
Since replication via PITR runs asynchrously, and runs one-way (master-slave),
queries running on the slave are of course not allowed to insert, update or
delete data, nor to change the schema in any way. But there are still write
operations in the datadir that they _are_ allowed to do.
Those are
.) Creating temporary files for on-disk sorting and spilling out
tuplestores
.) Setting XMIN_COMMITTED and XMAX_COMITTED on heap tuples
.) Setting LP_DELETE on index tuples.
Note that creating temporary tables is not allowed. This is necessary
because temporary tables have associated entries in pg_class,
which obviously can't be created on PITR slaves.
Postgres already supports set transaction read only during normal operation.
On a read-only PITR slave every transaction will automatically be flagged
read-only, which results in nice error messages (like ERROR: transaction is
read-only) if a user tries to execute insert/updates/deletes or
schema-changing operations. Also any command that has to be execute outside of
a transaction block (VACUUM) is disallowed on PITR slaved. As an additional
protection, a global variable read_only_mode is introduced. If in read-only
mode, this is set to true for all backends except the WAL replay process, and
the following checks are added.
.) MarkBufferDirty() is changed to throw an error if read_only_mode == 1
Hint bit updates already use SetBufferCommitInfoNeedsSave() instead of
MarkBufferDirty(), which just suits use fine.
.) XLogInsert() and XLogWrite() throws if read_only_mode == 1
.) SlruPhysicalWritePage() and SimpleLruWritePage() throws if read_only_mode
== 1
This prevents creating or changing multixacts, subtrans and clog entries.
.) EndPrepare() and FinishPreparedTransaction() throws if read_only_mode == 1
This prevents preparing transaction, and committing/rolling-back prepared
transactions.
Those checks serve as a safety measure against holes in the already existing
read-only transaction logic. Note that read-onyl transactions won't generate
clog updates, because those are already skipped for transactions that neither
wrote xlog entries, temporary tables nor deleted files.
The following holes currently exist in the read-only transaction logic. Fixing
those is not critical - the lowlevel checks outlined above catch them all -
but would allow displaying better error messages.
.) nextval(), setval()
.) CLUSTER
.) NOTIFY
Disallowing those in all read-only transactions (Not only on PITR slaves) seems
sensible, but it might create compatibility problems.
Allowing read-only queries and WAL archive replay to run side-by-side