Thanks for the advice. Currently I see a lot of I/O related to update/inserts, so I'm trying to track down these guys at first. In relation to question 2, I read somewhere in the documentation that because of MVCC, the whole row has to be rewritten even though I just update one single column in that row. Hence if the table is wider (has more columns), the update will be slower. Does this match your understanding?
Den 28/11/2012 kl. 14.10 skrev Willem Leenen <willem_lee...@hotmail.com>: > Niels, > > " I can't see why it would make sense to put that into a separate table and > join in the values " > You don't normalize for performance. People DEnormalize for performance. > > > Questions: (AFAIK) > > 1) This is a way to disaster. Get yourself a book on RDBMS from for example > Celko. Do NOT go against the flow of the RDBMS rules, as here in rule #1 > atomic values of a column. > > 2) This is not the big fish you are after. First benchmark your setup and > compare the results with your desired performance level. First quantify your > problem, if there is any, before using tricks. > > 3) A row will need more memory when it is wider, this may be amplified during > hash joins. > > 4) People DEnormalize for performance. > > 5) " Is it significantly faster to select * from a table with 20 columns, > than selecting the same 20 in a table with 150 columns?" > > I know the answer, but i encourage you to simply test this. I have seen lot's > of urban legends about performance ( including the dropping of the > referential integrity be cause that would make a difference.... ). > Of course , when it's a full table scan, and it are ALL disk reads, (or ALL > memory reads_) you can simply calculate it too. But just get into the habit > of testing for learning. > > > My advice: > - know what performance you need. > - test if you have this, varying tablecontent and systemload > - do not tamper with the RDBMS rules, this will haunt you. > - if you have the latest postgres version, you can use covering indexes: > tables aren't accessed at all, bypassing most of your questions. Check with > peers if you've got the indexes right. > > Regards, > Willem > > > > > From: nielskrist...@autouncle.com > > Subject: [PERFORM] Database design - best practice > > Date: Wed, 28 Nov 2012 13:41:14 +0100 > > To: pgsql-performance@postgresql.org > > > > Hi, > > > > I'm on the hunt for some solid knowledge on a theoretical level about the > > performance of postgresql. My question is regarding best practices, and how > > architectural decisions might influence the performance. First a little > > background: > > > > The setup: > > I have a database which holds informations on used cars. The database has > > mainly 3 tables of interest for this case: > > A cars table, an adverts table and a sellers table. One car has many > > adverts and one seller has many adverts. One advert belongs to one car and > > one seller. > > The database is powering a website for searching used cars. When searching > > for used cars, the cars table is mainly used, and a lot of the columns > > should be directly available for searching e.g. color, milage, price, > > has_automatic_transmission etc. > > > > So my main concern is actually about the cars table, since this one > > currently has a lot of columns (151 - I expect thats quite a lot?), and a > > lot of data (4 mil. rows, and growing). Now you might start by thinking, > > this could sound like a regular need for some normalization, but wait a > > second and let me explain :-) > > The columns in this table is for the most very short stings, integers, > > decimals or booleans. So take for an example has_automatic_transmission > > (boolean) I can't see why it would make sense to put that into a separate > > table and join in the values. Or the milage or the price as another > > example. The cars table used for search is indexed quite a lot. > > > > The questions: > > Having the above setup in mind, what impact on performance, in terms of > > read performance and write performance, does it have, whether I do the > > following: > > 1) In general would the read and/or the write on the database be faster, if > > I serialized some of the not searched columns in the table into a single > > text columns instead of let's say 20 booleans? > > 2) Lets say I'm updating a timestamp in a single one of the 151 columns in > > the cars table. The update statement is using the id to find the car. Would > > the write performance of that UPDATE be affected, if the table had fewer > > columns? > > 3) When adding a new column to the table i know that it becomes slower the > > more rows is in the table, but what about the "width" of the table does > > that affect the performance when adding new columns? > > 4) In general what performance downsides do you get when adding a lot of > > columns to one table instead of having them in separate tables? > > 5) Is it significantly faster to select * from a table with 20 columns, > > than selecting the same 20 in a table with 150 columns? > > > > Hope there is some good answers out there :-) > > > > -- > > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-performance