Disclaimer: I am relatively new to RDBMSs, so please do not laugh at me too loudly, you can laugh, just not too loudly and please do not point. :)
I am working on an Automated Installer Testing System for Adobe Systems and I am doing a DB redesign of the current postgres db:
1. We are testing a matrix of over 900 Acrobat installer configurations and we are tracking every file and registry entry that is affected by an installation.
2. a single file or registry entry that is affected by any test is stored in the db as a record.
3. a typical record is about 12 columns of string data. the data is all information about a file (mac or windows) or windows registry entry [ file or regkey name, file size, modification date, checksum, permissions, owner, group, and in the case of a mac, we are getting all the hfs atts as well].
4. A typical test produces anywhere from 2000 - 5000 records.
Our db is getting to be a respectable size (about 10GB right now) and is growing slower and slower. I have been charged with making it faster and with a smaller footprint while retaining all of the current functionality. here is one of my ideas. Please tell me if I am crazy:
The strings that we are storing (mentioned in 3 above) are extremely repetitive. for example, there are a limited number of permissions for the files in the acrobat installer and we are storing this information over and over again in the tables. The same goes for filenames, registry key names and almost all of the data we are storing. So it seems to me that to create a smaller and faster database we could assign an integer to each string and just store the integer representation of the string rather than the string itself. Then we would just store the strings in a separate table one time and do join queries against the tables that are holding the strings and the main data tables. for example,
a table that would hold unique permissions strings would look like
string | id --------------------- 'drwxr-xr-x' | 1 '-rw-------' | 2 'drwxrwxr-x' | 3 '-rw-r--r--' | 4
then in my data I would just store 1,2,3 or 4 instead of the whole permissions string.
it seems to me that we would save lots of space and over time not see the same performance degradation.
anyways, please tell me if this makes sense and make any other suggestions that you can think of. I am just now starting this analysis so I cannot give specifics as to where we are seeing poor performance just yet. just tell me if my concepts are correct.
thanks for your time and for suffering this email.
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings