Re: [PERFORM] hardware performance and some more

William Yu Thu, 24 Jul 2003 09:42:16 -0700

| first of all I would like to learn that, any of you use the postgresql
| within the clustered environment? Or, let me ask you the question, in
| different manner, can we use postgresql in a cluster environment? If
| we can do what is the support method of the postgresql for clusters?

You could do active-active but it would require work on your end. I did a recent check on all the Postgres replication packages and they all seem to be single master -> single/many slaves. Updating on more than 1 server looks to be problematic. I run an active-active now but I had to develop my own custom replication strategy.

As a background, we develop & host web-based apps that use Postgres as the DB engine. Since our clients access our server over the internet, uptime is a big issue. Hence, we have two server farms: one colocated in San Francisco and the other in Sterling, VA. In addition to redudancy, we also wanted to spread the load across the servers. To do this, we went with the expedient method of 1-minute DNS zonemaps where if both servers are up, 70% traffic is sent to the faster farm and 30% to the other. Both servers are constantly monitored and if one goes down, a new zonemap is pushed out listing only the servers that are up.

The first step in making this work was converting all integer keys to character keys. By making keys into characters, we could prepend a server location code so ID 100 generated at SF would not conflict with ID 100 generated in Sterling. Instead, they would be marked as S00000100 and V00000100. Another benefit is the increase of possible key combinations by being able to use alpha characters. (36^(n-1) versus 10^n)

At this time, the method we use is a periodic sweep of all updated records. In every table, we add extra fields to mark the date/time the record was last inserted/updated/deleted. All records touched as of the last resync are extracted, zipped up, pgp-encrypted and then posted on an ftp server. Files are then transfered between servers, records unpacked and inserted/updated. Some checks are needed to determine what takes precedence if users updated the same record on both servers but otherwise it's a straightforward process.

As far as I can tell, the performance impact seems to be minimal. There's a periodic storm of replication updates in cases where there's mass updates sync last resync. But if you have mostly reads and few writes, you shouldn't see this situation. The biggest performance impact seems to be the CPU power needed to zip/unzip/encrypt/decrypt files.

I'm thinking over strats to get more "real-time" replication working. I suppose I could just make the resync program run more often but that's a bit inelegant. Perhaps I could capture every update/delete/insert/alter statement from the postgres logs, parsing them out to commands and then zipping/encrypting every command as a separate item to be processed. Or add triggers to every table where updated records are pushed to a custom "updated log".

The biggest problem is of course locks -- especially at the application level. I'm still thinking over what to do here.


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] hardware performance and some more

Reply via email to