We are building a data warehouse composed of essentially click stream data. The DB is growing fairly quickly as to be expected, currently at 90GB for one months data. The idea is to keep 6 months detailed data on line and then start aggregating older data to summary tables. We have 2 fact tables currently, one with about 68 million rows and the other with about 210 million rows. Numerous dimension tables ranging from a dozen rows to millions.
We are currently running on a Dell 2650 with 2 Xeon 2.8 processors in hyper-threading mode, 4GB of ram, and 5 SCSI drives in a RAID 0, Adaptec PERC3/Di, configuration. I believe they are 10k drives. Files system is EXT3. We are running RH9 Linux kernel 2.4.20-20.9SMP with bigmem turned on. This box is used only for the warehouse. All the ETL work is done on this machine as well. DB version is postgreSQL 7.4.
We are running into issues with IO saturation obviously. Since this thing is only going to get bigger we are looking for some advice on how to accommodate DB's of this size.
First question is do we gain anything by moving the RH Enterprise version of Linux in terms of performance, mainly in the IO realm as we are not CPU bound at all? Second and more radical, has anyone run postgreSQL on the new Apple G5 with an XRaid system? This seems like a great value combination. Fast CPU, wide bus, Fibre Channel IO, 2.5TB all for ~17k.
I keep see references to terabyte postgreSQL installations, I was wondering if anyone on this list is in charge of one of those and can offer some advice on how to position ourselves hardware wise.
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly