[PERFORM] Number of Columns and Update

2014-12-22 Thread Robert DiFalco
This may fall into the category of over-optimization but I've become
curious.

I have a user table with about 14 columns that are all 1:1 data - so they
can't be normalized.

When I insert a row all columns need to be set. But when I update, I
sometimes only update 1-2 columns at a time. Does the number of columns
impact update speed?

For example:
 UPDATE users SET email = ? WHERE id = ?;

I can easily break this up into logical tables like user_profile,
user_credential, user_contact_info, user_summary, etc with each table only
having 1-4 columns. But with the multiple tables I would often be joining
them to bring back a collection of columns.

I know I'm over thinking this but I'm curious of what the performance trade
offs are for breaking up a table into smaller logically grouped tables.

Thanks.


Re: [PERFORM] Number of Columns and Update

2014-12-22 Thread Heikki Linnakangas

On 12/22/2014 10:53 PM, Robert DiFalco wrote:

This may fall into the category of over-optimization but I've become
curious.

I have a user table with about 14 columns that are all 1:1 data - so they
can't be normalized.

When I insert a row all columns need to be set. But when I update, I
sometimes only update 1-2 columns at a time. Does the number of columns
impact update speed?

For example:
  UPDATE users SET email = ? WHERE id = ?;


Yes, the number of columns in the table matters. The update is just as 
expensive regardless of how many of the columns you update.


When a row is updated, PostgreSQL creates a new version of the whole 
row. The new row version takes more space when the table has more 
columns, leading to more bloating of the table, which generally slows 
things down. In most applications the difference isn't big enough to 
matter, but it can be significant if you have very wide rows, and you 
update a lot.


PostgreSQL 9.4 made an improvement on this. In earlier versions, the new 
row version was also included completely in the WAL record, which added 
overhead. In 9.4, any columns at the beginning or end of the row that 
are not modified are left out of the WAL record, as long as the new row 
version is stored on the same page as the old one (which is common). For 
updating a single column, or a few columns that are next to each other, 
that's the same as saying that only the modified part of the row is 
WAL-logged.



I can easily break this up into logical tables like user_profile,
user_credential, user_contact_info, user_summary, etc with each table only
having 1-4 columns. But with the multiple tables I would often be joining
them to bring back a collection of columns.


That would help with the above-mentioned issues, but dealing with 
multiple tables certainly adds a lot of overhead too. Most likely you're 
better off just having the single table, after all.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of Columns and Update

2014-12-22 Thread Andrew Dunstan


On 12/22/2014 03:53 PM, Robert DiFalco wrote:
This may fall into the category of over-optimization but I've become 
curious.


I have a user table with about 14 columns that are all 1:1 data - so 
they can't be normalized.


When I insert a row all columns need to be set. But when I update, I 
sometimes only update 1-2 columns at a time. Does the number of 
columns impact update speed?


For example:
 UPDATE users SET email = ? WHERE id = ?;

I can easily break this up into logical tables like user_profile, 
user_credential, user_contact_info, user_summary, etc with each table 
only having 1-4 columns. But with the multiple tables I would often be 
joining them to bring back a collection of columns.


I know I'm over thinking this but I'm curious of what the performance 
trade offs are for breaking up a table into smaller logically grouped 
tables.





An update rewrites the whole row, not just the updated columns.

I think you are overthinking it.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance