Re: [SQL] Derived columns / denormalization

Jamie Tufnell Sun, 18 Jan 2009 16:33:57 -0800

On 1/17/09, Erik Jones <ejo...@engineyard.com> wrote:
> On Jan 15, 2009, at 8:06 PM, Tom Lane wrote:
>> "Jamie Tufnell" <die...@googlemail.com> writes:
>>> item_count int -- this is derived from (select count(*) from items
>>> where group_id = id)
>>> ...
>>
>>> item_count would be updated by insert/update/delete triggers on the
>>> items table, hopefully that would ensure it is always correct?
>>
>> Concurrent updates to the items table make this much harder than
>> it might first appear.  If you're willing to serialize all your
>> updating
>> transactions then you can make it work, but ...
>
> That was exactly the caveat I was about to point out.  That being
> said, keeping COUNT() values and other computed statistics based on
> other data in the database *is* a fairly common "tactic".  On method
> that I've used to great success to avoid the serialization problem is
> to have your triggers actually insert the necessary information for
> the update into a separate "update queue" table.  You then have
> separate process that routinely sweeps that update queue, aggregates
> the updates and then updates your count values in the groups table
> with the total update values for each groups entry with updates.


Fortunately our items table rarely sees concurrent writes.  It's over
99% reads and is typically updated by just one user.  We are already
caching these aggregates and other data in a separate layer and my
goal is to see if I can get rid of that layer.

In light of your advice though, I will think things through a bit more first.

Thanks for your help!
Jamie

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Derived columns / denormalization

Reply via email to