Re: [PERFORM] Database design - best practice

Niels Kristian Schjødt Wed, 28 Nov 2012 05:21:17 -0800

Thanks for the advice.

Currently I see a lot of I/O related to update/inserts, so I'm trying to track 
down these guys at first. In relation to question 2, I read somewhere in the 
documentation that because of MVCC, the whole row has to be rewritten even 
though I just update one single column in that row. Hence if the table is wider 
(has more columns), the update will be slower. Does this match your 
understanding?


Den 28/11/2012 kl. 14.10 skrev Willem Leenen <willem_lee...@hotmail.com>:

> Niels,
> 
> " I can't see why it would make sense to put that into a separate table and 
> join in the values " 
> You don't normalize for performance. People DEnormalize for performance.
> 
> 
> Questions: (AFAIK)
> 
> 1) This is a way to disaster. Get yourself a book on RDBMS from for example 
> Celko. Do NOT go against the flow of the RDBMS rules, as here in rule #1 
> atomic values of a column. 
> 
> 2) This is not the big fish you are after. First benchmark your setup and 
> compare the results with your desired performance level. First quantify your 
> problem, if there is any, before using tricks.
> 
> 3) A row will need more memory when it is wider, this may be amplified during 
> hash joins. 
> 
> 4) People DEnormalize for performance. 
> 
> 5) " Is it significantly faster to select * from a table with 20 columns, 
> than selecting the same 20 in a table with 150 columns?" 
> 
> I know the answer, but i encourage you to simply test this. I have seen lot's 
> of urban legends about performance ( including the dropping of the 
> referential integrity be cause that would make a difference.... ). 
> Of course , when it's a full table scan, and it are ALL disk reads, (or ALL 
> memory reads_) you can simply calculate it too. But just get into the habit 
> of  testing for learning.
> 
> 
> My advice:
> - know what performance you need.
> - test if you have this, varying tablecontent and systemload
> - do not tamper with the RDBMS rules, this will haunt you.
> - if you have the latest postgres version, you can use covering indexes: 
> tables aren't accessed at all, bypassing most of your questions. Check with 
> peers if you've got the indexes right.
> 
> Regards,
> Willem
> 
> 
> 
> > From: nielskrist...@autouncle.com
> > Subject: [PERFORM] Database design - best practice
> > Date: Wed, 28 Nov 2012 13:41:14 +0100
> > To: pgsql-performance@postgresql.org
> > 
> > Hi,
> > 
> > I'm on the hunt for some solid knowledge on a theoretical level about the 
> > performance of postgresql. My question is regarding best practices, and how 
> > architectural decisions might influence the performance. First a little 
> > background:
> > 
> > The setup:
> > I have a database which holds informations on used cars. The database has 
> > mainly 3 tables of interest for this case:
> > A cars table, an adverts table and a sellers table. One car has many 
> > adverts and one seller has many adverts. One advert belongs to one car and 
> > one seller.
> > The database is powering a website for searching used cars. When searching 
> > for used cars, the cars table is mainly used, and a lot of the columns 
> > should be directly available for searching e.g. color, milage, price, 
> > has_automatic_transmission etc.
> > 
> > So my main concern is actually about the cars table, since this one 
> > currently has a lot of columns (151 - I expect thats quite a lot?), and a 
> > lot of data (4 mil. rows, and growing). Now you might start by thinking, 
> > this could sound like a regular need for some normalization, but wait a 
> > second and let me explain :-)
> > The columns in this table is for the most very short stings, integers, 
> > decimals or booleans. So take for an example has_automatic_transmission 
> > (boolean) I can't see why it would make sense to put that into a separate 
> > table and join in the values. Or the milage or the price as another 
> > example. The cars table used for search is indexed quite a lot.
> > 
> > The questions:
> > Having the above setup in mind, what impact on performance, in terms of 
> > read performance and write performance, does it have, whether I do the 
> > following:
> > 1) In general would the read and/or the write on the database be faster, if 
> > I serialized some of the not searched columns in the table into a single 
> > text columns instead of let's say 20 booleans?
> > 2) Lets say I'm updating a timestamp in a single one of the 151 columns in 
> > the cars table. The update statement is using the id to find the car. Would 
> > the write performance of that UPDATE be affected, if the table had fewer 
> > columns?
> > 3) When adding a new column to the table i know that it becomes slower the 
> > more rows is in the table, but what about the "width" of the table does 
> > that affect the performance when adding new columns?
> > 4) In general what performance downsides do you get when adding a lot of 
> > columns to one table instead of having them in separate tables?
> > 5) Is it significantly faster to select * from a table with 20 columns, 
> > than selecting the same 20 in a table with 150 columns?
> > 
> > Hope there is some good answers out there :-)
> > 
> > -- 
> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Database design - best practice

Reply via email to