[PERFORM] design question: general db performance

2003-11-25 Thread shane hill
Hi folks,

Disclaimer:  I am relatively new to RDBMSs, so please do not laugh at me 
too loudly,  you can laugh, just not too loudly and please do not point. :)

I am working on an Automated Installer Testing System for Adobe Systems 
and I am doing a DB redesign of the current postgres db:

1. We are testing a matrix of over 900 Acrobat installer configurations 
and we are tracking every file and registry entry that is affected by an 
installation.

2. a single file or registry entry that is affected by any test is 
stored in the db as a record.

3. a typical record is about 12 columns of string data. the data is all 
information about a file (mac or windows) or windows registry entry [ 
file or regkey name, file size,  modification date, checksum, 
permissions, owner, group, and in the case of a mac, we are getting all 
the hfs atts as well].

4. A typical test produces anywhere from 2000 - 5000 records.

Our db is getting to be a respectable size (about 10GB right now) and is 
growing slower and slower. I have been charged with making it faster and 
with a smaller footprint while retaining all of the current 
functionality.  here is one of my ideas.  Please tell me if I am crazy:

The strings that we are storing (mentioned in 3 above) are extremely 
repetitive. for example, there are a limited number of permissions for 
the files in the acrobat installer and we are storing this information 
over and over again in the tables. The same goes for filenames, registry 
key names and almost all of the data we are storing.  So it seems to me 
that to create a smaller and faster database we could assign an integer 
to each string and just store the integer representation of the string 
rather than the string itself.  Then we would just store the strings in 
a separate table one time and do join queries against the tables that 
are holding the strings and the main data tables.  for example,

a table that would hold unique permissions strings would look like

table:  perms_strs

string |  id
-
'drwxr-xr-x'   |   1
'-rw---'|   2
'drwxrwxr-x'  |   3
'-rw-r--r--' |   4
then in my data I would just store 1,2,3 or 4 instead of the whole 
permissions string.

it seems to me that we would save lots of space and over time not see 
the same performance degradation.

anyways,  please tell me if this makes sense and make any other 
suggestions that you can think of.  I am just now starting this analysis 
so I cannot give specifics as to where we are seeing poor performance 
just yet.  just tell me if my concepts are correct.

thanks for your time and for suffering this email.

chao,

-Shane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] design question: general db performance

2003-11-25 Thread Jord Tanner
[small chuckle]

By George, I think he's got it!

You are on the right track. Have a look at this link on database
normalization for more info:

http://databases.about.com/library/weekly/aa080501a.htm 



On Tue, 2003-11-25 at 10:42, shane hill wrote:
 Hi folks,
 
 Disclaimer:  I am relatively new to RDBMSs, so please do not laugh at me 
 too loudly,  you can laugh, just not too loudly and please do not point. :)
 

[snip]

-- 
Jord Tanner [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] design question: general db performance

2003-11-25 Thread Josh Berkus
Shane,

 Disclaimer:  I am relatively new to RDBMSs, so please do not laugh at me 
 too loudly,  you can laugh, just not too loudly and please do not point. :)

Hey, we all started somewhere.  Nobody was born knowing databases.  Except 
maybe Neil Conway.

 I am working on an Automated Installer Testing System for Adobe Systems 
 and I am doing a DB redesign of the current postgres db:

Cool!We're going to want to talk to you about a case study later, if you 
can get your boss to authorize it 

 Our db is getting to be a respectable size (about 10GB right now) and is 
 growing slower and slower. 

Slower and slower?   Hmmm ... what's your VACUUM. ANALYZE  REINDEX schedule?  
What PG version?  What are your postgresql.conf settings?   Progressive 
performance loss may indicate a problem with one or more of these things ...

 then in my data I would just store 1,2,3 or 4 instead of the whole 
 permissions string.
 
 it seems to me that we would save lots of space and over time not see 
 the same performance degradation.

Yes, this is a good idea.   Abstracting other repetitive data is good too.  
Also keep in mind that the permissions themselves can be represented as octal 
numbers instead of strings, which takes less space.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html