We have had four databases serving our web site, but due to licensing issues, we have had to take two out of production, and we are looking to bring those two onto PostgreSQL very quickly, with an eye toward moving everything in the longer term. The central web DBs are all copies of the same data, drawn from 72 servers at remote locations. We replicate modifications made at these 72 remote sites real-time to all central servers.
On each central server, there are 352 tables and 412 indexes holding about 700 million rows, taking almost 200 GB of disk space. The largest table has about 125 million of those rows, with several indexes. There are about 3 million database transactions modifying each central database every day, with each transaction typically containing many inserts and/or updates -- deletes are sparse. During idle time the replication process compares tables in the source databases to the central databases to log any differences and correct the central copies. To support the 2 million browser and SOAP hits per day, the web sites spread about 6 million SELECT statements across available central servers, using load balancing. Many of these queries involve a 10 or more tables with many subqueries; some involve unions.
The manager of the DBA team is reluctant to change both the OS and the DBMS at the same time, so unless I can make a strong case for why it is important to run postgresql under Linux, we will be running this on Windows. Currently, there are two Java-based middle tier processes running on each central database server, one for the replication and one for the web. We expect to keep it that way, so the database needs to play well with these processes.
I've been reading everything I can find on postgresql configuration, but would welcome any specific suggestions for this environment. I'd also be really happy to hear that we're not the first to use postgresql with this much data and load.
Thanks for any info you can provide.