Re: [HACKERS] Some questions about mammoth replication

Alexey Klyukin Fri, 12 Oct 2007 04:58:42 -0700

Hannu Krosing wrote:

> > We don't use either a log table in database or WAL. The data to
> > replicate is stored in disk files, one per transaction.
> 
> Clever :)
> 
> How well does it scale ? That is, at what transaction rate can your
> replication keep up with database ?


This depend on a number of concurrent transactions (the data is
collected by every backend process), max transaction size etc. I don't
have numbers here, sorry.

> 
> >  As Joshua said,
> > the WAL is used to ensure that only those transactions that are recorded
> > as committed in WAL are sent to slaves.
> 
> How do you force correct commit order of applying the transactions ?

The first transaction that is committed in PostgreSQL is the first
transaction placed into the queue, and the first that is restored by the
slave.

> 
> > > 
> > > > > Do you make use of snapshot data, to make sure, what parts of WAL log
> > > > > are worth migrating to slaves , or do you just apply everything in WAL
> > > > > in separate transactions and abort if you find out that original
> > > > > transaction aborted ?
> > > > 
> > > > We check if a data transaction is recorded in WAL before sending
> > > > it to a slave. For an aborted transaction we just discard all data 
> > > > collected 
> > > > from that transaction.
> > > 
> > > Do you duplicate postgresql's MVCC code for that, or will this happen
> > > automatically via using MVCC itself for collected data ?
> > 
> > Every transaction command that changes data in a replicated relation is
> > stored on disk. PostgreSQL MVCC code is used on a slave in a natural way
> > when transaction commands are replayed there.
> 
> Do you replay several transaction files in the same transaction on
> slave ?

> Can you replay several transaction files in parallel ?

No, we have plans for concurrent restore of replicated data, but
currently we a single slave process responsible for restoring data
received from MCP.

> 
> > > How do you handle really large inserts/updates/deletes, which change say 
> > > 10M 
> > > rows in one transaction ?
> > 
> > We produce really large disk files ;). When a transaction commits - a
> > special queue lock is acquired and transaction is enqueued to a sending
> > queue. 
> > Since the locking mode for that lock is exclusive a commit of a
> > very large transaction would delay commits of other transactions until
> > the lock is held. We are working on minimizing the time of holding this
> > lock in the new version of Replicator.
> 
> Why does it take longer to queue a large file ? dou you copy data from
> one file to another ?

Yes, currently the data is copied from the transaction files into the
queue (this doesn't apply to dump transactions).

However, we have recently changed this, the new code will acquire the
queue lock only to record transaction as committed in replication log
without moving the data.

> > > Do you replay it as SQL insert/update/delete commands, or directly on
> > > heap/indexes ?
> > 
> > We replay the commands directly using heap/index functions on a slave.
> 
> Does that mean that the table structures will be exactly the same on
> both master slave ? 

Yes, the table structure on the slaves should match the table structure
on master.

> That is, do you replicate a physical table image
> (maybe not including transaction ids on master) ?

Yes, we call this 'full dump', and it is fired automatically for every
replicated table. We replicate only data however, not DDL commands to
create/alter table or sequence.

> 
> Or you just use lower-level versions on INSERT/UPDATE/DELETE ?
> 
> ---------------------
> Hannu
> 
> 
> 

Regards,
-- 
Alexey Klyukin                         http://www.commandprompt.com/
The PostgreSQL Company - Command Prompt, Inc.


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Re: [HACKERS] Some questions about mammoth replication

Reply via email to