What's interesting is if you read on how many of the very large sites have
scaled (ebay, digg, facebook).. they have gone through a similar pattern as
they went through their organic growth.  You can find some of the excerpts
here (http://highscalability.com).  The general pattern seems to be a
normalized database -> clustered databases -> denormalization and
'sharding'.

Of course it would be somewhat silly to plan for this type of architecture
before you really see where the load really is hitting.  It's a never-ending
battle where you clear one bottleneck to discover another.

On 9/4/07, John Bresnik <[EMAIL PROTECTED]> wrote:
>
> Nick,
>
> Denormalization is traditionally found in data marts [data warehouses,
> OLAP] and is the moral opposite of everything you learn in DBA school.
> That said they do make a lot of sense for performance reasons and are
> really nothing more than a lot of redundancy for the purpose of
> speeding up queries [normalization itself has much more to do with
> disk space limitations of the 1980s than anything else and, like all
> dogma, isn't questioned anymore even though it may not be as
> important/relevant as it once was].
>
> The idea of caching relationships is a tricky but an interesting idea
> and sounds a lot like an inference engine concept called 'backward
> chaining' - the idea that all associations are worked out already [as
> opposed to forward chaining where they are determined on the fly]. I
> say its tricky because things can complicated *real fast [esp when it
> comes to refreshing this cache with new relationships].  But that's a
> start..
>
> Another viable idea would be, as you hinted, to keep a data structure
> in memory that would represent the associations - this is of course a
> 'graph' in classical terms and not an easily optimized data structure.
> There are some interesting implementations using hash maps / trees,
> i.e .you could probably develop a sufficient enough data structure by
> digging deep into the ruby and taking the time to understand how ruby
> implements each data structure [or revive/connect an old c lib that
> provides a graph] - then really just use the database as metadata for
> an existing relationship [graph would only contain IDs to keep it
> lightweight [2-4 bytes a node on most hardware] then expanded once the
> determination is made]. but yea this is some heavy s*** man..
>
> brez
>
>
>
>
> On 9/4/07, Nick Zadrozny <[EMAIL PROTECTED]> wrote:
> > Hey all,
> >
> > I've been thinking about an optimization problem that I'm vaguely
> familiar
> > with and not quite sure how to get started on.
> >
> > I've got an application in which each record might be associated with
> > thousands of others through a join model. A has_many :through situation.
> The
> > join models are important in and of themselves, but often I want to just
> > grab all the associated objects, and this is starting to get a bit
> > burdensome on the database.
> >
> > I'm tentatively thinking that denormalization would help me out here.
> But
> > that sort of thing is approaching the limits of my database knowledge.
> The
> > question comes down to this: say you want to cache the primary keys of
> > thousands of associated objects. What would that look like in your
> schema?
> > In your queries?
> >
> > Like I said, database noob here, so let's have the noob explanation.
> Also,
> > pointers to books or tutorials are welcome. I'd welcome some looks at
> > alternate caching strategies — this information doesn't necessarily have
> to
> > persist — but denormalization is something I would like to know more
> about
> > in general.
> >
> > --
> > Nick Zadrozny • beyondthepath.com
> > _______________________________________________
> > Sdruby mailing list
> > [email protected]
> > http://lists.sdruby.com/mailman/listinfo/sdruby
> >
> >
>
>
> --
> John Bresnik
> (619) 228-6254
> _______________________________________________
> Sdruby mailing list
> [email protected]
> http://lists.sdruby.com/mailman/listinfo/sdruby
>
_______________________________________________
Sdruby mailing list
[email protected]
http://lists.sdruby.com/mailman/listinfo/sdruby

Reply via email to