I just discussed about my problem on IRC. I am building a Web usage mining system based on Linux, PostgreSQL and C++ made up of an OLTP database which feeds several and multi-purpose data warehouses about users' behaviour on HTTP servers.
I modelled every warehouse using the star schema, with a fact table and then 'n' dimension tables linked using a surrogate ID.
Discussing with the guys of the chat, I came up with these conclusions, regarding the warehouse's performance:
1) don't use referential integrity in the facts table 2) use INTEGER and avoid SMALLINT and NUMERIC types for dimensions' IDs 3) use an index for every dimension's ID in the fact table
As far as administration is concerned: run VACUUM ANALYSE daily and VACUUM FULL periodically.
Is there anything else I should keep in mind?
Also, I was looking for advice regarding hardware requirements for a data warehouse system that needs to satisfy online queries. I have indeed no idea at the moment. I can only predict 4 million about records a month in the fact table, does it make sense or not? is it too much?
Data needs to be easily backed up and eventually replicated.
Having this in mind, what hardware architecture should I look for? How many hard disks do I need, what kind and what RAID solution do you suggest me to adopt (5 or 10 - I think)?
Thank you so much,
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check maintainer
Current Location: Prato, Toscana, Italia
[EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The Inferno
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]