Re: [sqlite] Denormalisation

2009-07-27 Thread Jay A. Kreibich
On Mon, Jul 27, 2009 at 02:44:08PM -0700, CityDev scratched on the wall:
> 
> More recently the term 'denormalise' has been used
> instead. This is where you repeat foreign data in a table to avoid the
> overhead of joins at runtime. 

> Over the intervening years I can't ever remember denormalising data (even
> when dealing with eg 13 million insurance customers in a table). Is it OK
> nowadays to say always aim to be fully normalised - modern RDBMSs are
> usually powerful enough to cope with most anything?

  Normalization is a fundamental part of the Relational Model.  The
  first three Normal Forms were defined as part of the original
  research into the Relational Model.

  As you might remember, the basic idea behind normalization is to
  eliminate all duplicate data.  This is critical to data integrity,
  which is critical to the assumptions made by the Relational Model.
  A big part of normalization is making sure that each table (relation)
  is used to represent one and only one "thing."  A side effect of this
  is that when you normalize your design there is only one copy of any
  given atomic data unit.  If there is only one copy, it is impossible
  for that data to get out of sync, further forwarding the ideas of
  data integrity and consistency.

  It also tends to INCREASE performance for everything but SELECT
  statements.  In general, normalized databases tend to be smaller,
  thanks to all the duplicate elimination.  That means less I/O,
  which is a big performance hit for many database system.  If a
  smaller database can hold the same represented data, it also means
  a higher percentage of the unique data can fit into a given size
  cache or memory pool, further reducing the need for I/O.  It also
  means that if you need to update something (insert/update/delete)
  those changes tend to be extremely localized, making the change
  smaller and faster.

  The one trick is that if you split your data up into a bunch of
  tables, you need to spend time gathering it back up for most queries.
  That means JOINs.

  There are two problems with JOINs.  First, unless you're careful
  about your index placement and how the query is setup, they can become
  very expensive.  Normally they're fairly efficient, but it only takes
  a misplaced index to really screw things up.  So you have to
  understand what you're doing and pay attention.  Not a real big deal,
  but you need to have to know what is going on.
  
  The other issue (that plays strongly into the first) is that a lot of
  people fundamentally cannot wrap their heads around JOINs.  If you're
  uncomfortable with JOINs, you tend to design databases that don't
  require them.  And that's really too bad, as the JOIN is, in many
  ways, *The* fundamental Relational operator.


  So, in general, I would say yes... the goal should be 3NF (in nearly
  all cases, if you get to 3NF you get to 5NF... the last two are to
  cover a few somewhat unusual edge cases).  The normal forms really
  buy you a lot of practical benefits, and are a lot more than just a
  bunch of stuffy academic design rules.  
  
  I think it is fair to view denormalization as a last-resort performance
  optimization, but it should be seen as an optimization and it should
  be nearly last resort.  It shouldn't be done until you can prove the
  performance isn't there, that there are no other practical ways of
  getting it back (e.g. better index placement), and that you've
  measured and profiled (and NOT guessed) that the JOINs (or whatever
  the denormalization is going to get rid of) are the actual cause of
  the performance issues.


  The big exception to this, and where the most blatant denormalization
  takes place, is in data warehouses.  That's a pretty special case,
  however.  First, the data is basically write-once, read-many.  Once the
  data is imported, it is usually treated as read-only.  The lack of
  updates means there is less chance of inconsistencies being
  introduced-- assuming the original data is good.  Second, the data is
  usually dumped from another database that has higher normalization.
  That is, the data is "known good" due to the fact that it came from
  what was, in theory, a properly normalized and well formed database.
  This essentially turns the data warehouse into a series of "instanced
  views" where useful sets of data are "pre-queried" into common
  sub-parts of queries that are normally joined together from several
  tables for nearly every query you might want to run.  Essentially
  exactly the kind of thing you might use a view for, only you select
  it into a table.  Since the database is lacks updates except,
  typically, for nightly batch imports, this works out just fine as
  long as you remember to re-form the "views" each time you import new
  data.


  On the other hand, with the advent of the data driven website, many
  databases see a far greater number of reads than writes, so it makes
  some sense to push read 

Re: [sqlite] Denormalisation

2009-07-27 Thread Jim Showalter
Try writing apps for cellphones and both space and time become 
important again!

- Original Message - 
From: "Simon Slavin" <slav...@hearsay.demon.co.uk>
To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org>
Sent: Monday, July 27, 2009 3:33 PM
Subject: Re: [sqlite] Denormalisation


>
> On 27 Jul 2009, at 10:44pm, CityDev wrote:
>
>> Over the intervening years I can't ever remember denormalising data
>> (even
>> when dealing with eg 13 million insurance customers in a table). Is
>> it OK
>> nowadays to say always aim to be fully normalised - modern RDBMSs 
>> are
>> usually powerful enough to cope with most anything?
>
> Performance/space/time/money payoff calculation.  The smallest 
> mundane
> laptop these days comes with a 120 Gig hard disk.  No real need to
> economise on file size, so if you need fast processing you might use
> lots of space.  On the other hand if you need to cart your entire
> dataset around on a USB Flash drive you might want to save space.
>
> The advantage is that most programmers don't have to care which way
> they go: they write the program any way it occurs to them.  The
> disadvantage of this is that when they hit the first job where the
> difference does matter, they don't know what to do about it.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Denormalisation

2009-07-27 Thread BareFeet
> Is it OK nowadays to say always aim to be fully normalised - modern  
> RDBMSs are usually powerful enough to cope with most anything?


I suggest that aiming for fully normalised databases is the way to go,  
for the traditional and still relevant reasons of data integrity, file  
size, predictability, and generally only having to make data changes  
once. AFAIK database engines such as SQLite are optimised to cope with  
joins without suffering slowdowns and may even be faster, depending on  
the schema and data set. For instance an integer primary key if more  
efficiently searched than text.

> what people would say is normalising is the easy part; the skill  
> comes in 'collapsing'

I would agree that "collapsing" the normalized data into something  
humanly readable if probably the biggest difficulty. Joining tables to  
show related data is fairly trivial for an database designer worth  
their salt, but it can be difficult to facilitate updates to that  
data, where user input requires changing several related tables with  
related values.

I suggest that more fully exploiting "views" makes this easier to set  
up and maintain. In particular, using "instead of" triggers on the  
views greatly simplifies the design while shifting the burden of data  
integrity to the SQL engine, rather than creating pages of code in  
your own application. "Instead of" triggers cope with inserts, deletes  
and updates in a view, passing on the actions as desired to the  
underlying tables or raising exceptions for disallowed data etc. Your  
application (or even SQL command line) can then facilitate user  
changes to data in the views in the same way that it would to a  
denormalised table. It is fully portable from one application to  
another.

Tom
BareFeet

  --
Comparison of SQLite GUI applications:
http://www.tandb.com.au/sqlite/compare/?ml


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Denormalisation

2009-07-27 Thread Simon Slavin

On 27 Jul 2009, at 10:44pm, CityDev wrote:

> Over the intervening years I can't ever remember denormalising data  
> (even
> when dealing with eg 13 million insurance customers in a table). Is  
> it OK
> nowadays to say always aim to be fully normalised - modern RDBMSs are
> usually powerful enough to cope with most anything?

Performance/space/time/money payoff calculation.  The smallest mundane  
laptop these days comes with a 120 Gig hard disk.  No real need to  
economise on file size, so if you need fast processing you might use  
lots of space.  On the other hand if you need to cart your entire  
dataset around on a USB Flash drive you might want to save space.

The advantage is that most programmers don't have to care which way  
they go: they write the program any way it occurs to them.  The  
disadvantage of this is that when they hit the first job where the  
difference does matter, they don't know what to do about it.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Denormalisation

2009-07-27 Thread Rich Shepard
On Mon, 27 Jul 2009, CityDev wrote:

> Over the intervening years I can't ever remember denormalising data (even
> when dealing with eg 13 million insurance customers in a table). Is it OK
> nowadays to say always aim to be fully normalised - modern RDBMSs are
> usually powerful enough to cope with most anything?

   My opinion: _always_ go for 4th normal form. I believe that it was both
less robust RDBMSs and less powerful hardware that would justify
denormalizing for speed on rare occasions.

Rich

-- 
Richard B. Shepard, Ph.D.   |  IntegrityCredibility
Applied Ecosystem Services, Inc.|Innovation
 Voice: 503-667-4517  Fax: 503-667-8863
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Denormalisation

2009-07-27 Thread CityDev

I did a Computer Science MSc 30 years ago specialising in databases (the
relational model was only in prototypes). Of course normalisation was well
known, but what people would say is normalising is the easy part; the skill
comes in 'collapsing'. More recently the term 'denormalise' has been used
instead. This is where you repeat foreign data in a table to avoid the
overhead of joins at runtime. 

Over the intervening years I can't ever remember denormalising data (even
when dealing with eg 13 million insurance customers in a table). Is it OK
nowadays to say always aim to be fully normalised - modern RDBMSs are
usually powerful enough to cope with most anything?
-- 
View this message in context: 
http://www.nabble.com/Denormalisation-tp24688494p24688494.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users