(follow up)

fwd: denormalization articles


from "The Data Administration Newsletter" (TDAN)


(earlier version?: http://www.tdan.com/i001fe02.htm )

-

http://www.tdan.com/i014ht04.htm

---begin 1 of 2---

DENORMALIZATION AND THE
RULES OF RECONSTRUCTION
Dick Root - Thistledown Consulting Services

Introduction

Software and Database engineering are complex activities that require
planning and control to be successful.  By the time the DBA is called
up to tune the indices of a database it is probably already too late.
 Efficiency should be designed into the data structure before the
data is actually put on disk.  Since the invention of CASE tools
there is usually a missing step in the database design.  The logical
database design is set up in the modeling tool and then the DDL is
generated.  The table design of the physical database is the entity
design of the logical database.  Then when tuning is required, data
is moved around on disk, indices are applied, freespace is modified,
and more CPU memory is assigned.

The DBMS level tuning steps are valid and will continue to be used.
But, there has been a missing step in the database design process.

Physical Design of Databases

The word �denormalization� is used to describe changes to the table
design that cause the physical tables to differ from the normalized
entity relationship diagram.  �Denormalization� does not mean that
anything goes.  Denormalization does not mean chaos.  The development
of properly denormalized data structures follows software engineering
principles that insure that information will not be lost.  If the
table is read-only (periodically refreshed from the system-of-record)
then the rules are looser.  Star schemas and hyper-cubes are read-
only denormalizations.   If the data is to be distributed and/or
segmented and added-to, changed, or deleted from then the
reconstruction described below must be followed.  Fundamentally a
single principal must be followed.  If the individual table is
updated in more than one system, it should be possible to reconstruct
the original table as if the data was never reformatted or taken
apart.

Denormalization

There are many techniques for denormalizing a relational database
design.  These include �

1.       Duplicated data - This is the technique of making copies of
data whole or in part and storing and utilizing both the original and
the copy(s).  This technique is great unless you want to update it.
This is the area of distributed updates and synchronization.  Whole
texts have been written on this subject.  The general idea is that
extra-DBMS processes must insure integrity and accuracy.  Stored
joins are an example of duplicated data.

2.       Derived data - The issues with storing derived data are
accuracy and timeliness.  When the base data changes the
derivation(s) must change accordingly.  When the semantics of the
derived columns is �current balance� you have one sort of accuracy
problem.  When the semantics of the derived column is average sales
by product, salesman, and division, and month; and the salesman are
constantly being reassigned.  You have another accuracy problem.
Also many designers store the derivation in tables containing
inappropriate keys.  When derivations are not stored with their
logical (functionally dependent) keys subsequent  (tertiary)
derivations are inaccurate.  Also many derivations are non-additive
(percents, highest, lowest, etc).  This subject deserves many
chapters in data warehousing texts.  See references to summary data
and slowly changing dimensions.

3.        Surrogate keys - There is a problem with very long and
compound keys in that they are hard to use when writing queries and
they generate inefficient indices.  If the table has a very long key
and also has many rows this can generate a �show stopper� situation.
If the table has a maximum of 100,000,000 rows and a fifty byte real
compound key,  assigning a 10 digit surrogate key (and indexing on
it) will increase performance dramatically.  Imagine the situation
where the fifty byte key is used in an equi-join!  The real key(s)
should not be deleted after the surrogate key is added.  This would
make reversing out the surrogate key impossible.  And would offend
the Rule of Reconstruction (see below).  Usually the long real key is
made up of many sub-keys that are useful in their own right.

4.       Over Normalization (Vertical partitioning/segmentation) -
This is the technique of splitting the original logical table into
two or more physical tables.  By assigning some of the columns to one
physical table and some to another.  Both tables end up with the same
number of rows and have the same keys (see �Rule of Reconstruction�,
below).  Grossly this will increase performance since the individual
tables are now smaller.  In most DBMSs the negative affect of long
column length is non-liner. The query time against a 1000 byte row
length table can be more than twice the query time against a 500 byte
row length table.  So arbitrary vertical partitioning will cause much
better performance against each of the separate partitions. If you
are constantly joining the partitions, over normalization is self-
defeating. Therefore, the trick is to cluster the columns together
that are used together.

5.       Horizontal segmentation - This is the technique of storing
some of the rows in one table and some in another.  Many modern DBMSs
can do this automatically.  When the criteria for segmentation is non-
simple, segmentation must still be done programmatically.  Of course,
update anomalies occur when rows occur in more that one segment.

6.       Stored Joins - This is the technique of joining two or more
tables together and storing the answer set as an additional table.
This is one of the most common denormalizations.  If the stored join
table is never updated, there is no problem with this.  Since this
always generates duplicate data, updates are a problem.  Look out for
query anomalies when a measurement column is on the many side of the
relation being joined.

7.       Recurring data groups (vector data) - When there is a fixed
small number of subordinate tables associated with a table collapsing
the subordinate table into the parent table will increase
performance.  Care must be taken that the logical key of the
subordinate table is not deleted or obscured.  Otherwise the join is
not reversible and the �Rule of Reconstruction� is offended.


...
(end)

---begin 2 of 2---

http://www.tdan.com/i008hy02.htm

GREAT NEWS -
THE RELATIONAL MODEL IS DEAD!
Michael Gorman, Whitemarsh Information Systems, Corp.

1. Introduction



...

7. SQL/99's Impact on Database Applications

For the past 20 years, database designers and implementors have
struggled with highly normalized databases that perform poorly. The
only solution is to denormalize by collapsing hierarchies of non-
redundant tables into a single flat table with replicated data. While
these highly redundant collapsed tables speed data reporting, it
slows updating, and also becomes a significant risk for data
integrity. That is because the data is highly disbursed and is
duplicated across these report-tuned denormalized database structures
that are commonly known as data warehouses. For all these reasons,
most organizations only allow reporting from data warehouse databases.

As DBMS vendors implement SQL/99, the database design process will
transform itself from designing third normal table designs and then
denormalizing these tables to enable cost effective reports to a set
of database design activities similar to the ones that were commonly
performed in database design efforts of the middle 1970s through the
middle 1980s. There will have to be a greater knowledge of the
application�s processing to take advantage of the natural data
structure hierarchies now possible within SQL/99 tables.

While processing speeds will dramatically improve with SQL/99
conforming DBMSs, the effort and processing time effort required to
accomplish database redesigns and reorganizations will dramatically
increase.

In short, we are returning to the past. That is, adopting the data
structures of the network and independent logical file DBMSs. While
we will see increased performance for well designed and highly tuned
databases, we will also see the return of significant designer and
analyst time for database design and redesigns.

Keith Hare of JCC Consulting (www.jcc.com), a long time member of H2
and a user of Vax DBMS products put it best when he said, "With
SQL/99 you can get the best of both worlds and of course, you can get
the worst of both worlds. It is up to the database practitioners to
do the right thing."

The long version of the paper is available from the "What�s New"
section of the

http://www.wiscorp.com/whatsnew.html
-
http://www.wiscorp.com/featuredpapers.html


...
(end)

--
Please see the official ORACLE-L FAQ: http://www.orafaq.com
--
Author: Eric D. Pierce
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Reply via email to