Re: [ADMIN] replication/redundancy

Jonathan Gardner Tue, 01 Jul 2003 13:04:38 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 30 June 2003 09:17, [EMAIL PROTECTED] wrote:
> On Mon, Jun 30, 2003 at 08:31:09AM -0700, Jonathan Gardner wrote:
>
> * currently only an explicit sync-out is supported - from time to time
>   evry table has to be scanned for new records


So you are using "lazy" rather than "eager" replication. I am sure you know 
the limitations for lazy replication. Let me enumerate them here for those of 
you who aren't familiar with this:

  1) The data is not consistent. This means if you run the same select query 
at the same time on the two databases, you may get different results. For 
some situations, that is okay (like Usenet). For others, it is not. (like 
registrations -- you'll sign up on one database, but you won't appear on the 
other.)

  2) The "other" process that does the synchronization is serial in nature. 
The processes that change the database are parallel in nature. It is very 
possible to have changes happening to the database faster than you can 
replicate them. This was a real problem at a web company I recently worked 
for that used lazy replication. Their backup database fell weeks behind the 
live database. It almost got to the point where recreating the entire 
database would've been faster than waiting for the replication process to 
catch up.

  3) These two factors above make using the second database as a hot-swappable 
backup risky at best. You will lose some data when you switch to the backup, 
unless changes to the database are so rare that the backup is usually up to 
date. If that were the case, you probably don't need the backup in the first 
place, because databases that don't do much tend not to be very important.

>
> * currently no real conflict handling
>

What he is talking about here is what happens when two seperate processes are 
working on the same rows. PostgreSQL uses transactions and locking right now, 
so two processes on the same system cannot do this. However, his system 
cannot handle this at all when the two processes are on seperate machines.

The most obvious problem with this comes from incrementing a column. If both 
processes try to increment the same column, then they will end up with the 
column incremented by one or the other, but not both. This would be bad for 
things like paypal, where your account would only increase by one or the 
other account transfers, rather than both, if two occured at the same time.

>
> perhaps we can improve this a little bit.
>

I would hope you spend some time researching what others have done. Relational 
databases are an area that a tremendous amount of solid research has already 
occured. Applying yourself to understand the research and projects that have 
gone before you will save yourself a lot of time replicating their work. In 
other words, "If I have seen farther, it is because I have stood on the 
shoulders of giants" to (mis?)quote Newton.

Again, to re-emphasize why pgreplication is so cool and why everyone should be 
excited about this:
  1) Database theory says that scaleable, eager replication is impossible. 
This is true in practice.
  2) The Postgres-R team discovered a way to make scaleable, eager replication 
work. The restriction is that locks, once granted, may be aborted or revoked.
  3) This means you will one day be able to setup a beowulf-type cluster of 
postgres databases that will rival the most powerful databases on earth 
today.

- -- 
Jonathan Gardner <[EMAIL PROTECTED]>
(was [EMAIL PROTECTED])
Live Free, Use Linux!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/Aac+WgwF3QvpWNwRAgFxAJ9Mxesnc6Q3wLrUcL1Zz62AGLLjGACcCYJp
zcV9rFm8TiqH90N6eSpRQnY=
=/bFm
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Re: [ADMIN] replication/redundancy

Reply via email to