Re: [PERFORM] Compare rows

Josh Berkus Wed, 08 Oct 2003 22:39:47 -0700

Chris,

> Some time in the late '80s, probably '88 or '89, there was a paper
> presented in Communications of the ACM that proposed using this sort
> of "hypernormalized" schema as a way of having _really_ narrow schemas
> that would be exceedingly expressive.  They illustrated an example of
<snip>
> The entertaining claim was that they felt they could model the
> complexities of the operations of any sort of company using not more
> than 50 tables.  It seemed somewhat interesting, at the time; it truly
> resonated as Really Interesting when I saw SAP R/3, with its bloat of
> 1500-odd tables.


One can always take things too far.   Trying to make everying 100% dynamic so 
that you can cram your whole database into 4 tables is going too far; so is 
the kind of bloat that produces systems like SAP, which is more based on 
legacy than design (I analyzed a large commercial billing system once and was 
startled to discover that 1/4 of its 400 tables and almost half of the 40,000 
collective columns were not used and present only for backward 
compatibility).

The usefulness of the "vertical values child table" which I suggest is largely 
dependant on the number of values not represented.   In Greg's case, fully 
75% of the fields in his huge table are NULL; this is incredibly inefficient, 
the more so when you consider his task of calling each field by name in each 
query.

The "vertical values child table" is also ideal for User Defined Fields or any 
other form of user-configurable add-on data which will be NULL more often 
than not.

This is an old SQL concept, though; I'm sure it has an official name 
somewhere.

> The need to do a lot of joins would likely hurt performance somewhat,
> as well as the way that it greatly increases the number of rows.
> Although you could always split it into several tables, one for each
> "value_type", and UNION them into a view...

It increases the number of rows, yes, but *decreases* the storage size of data 
by eliminating thousands ... or millions ... of NULL fields.   How would 
splitting the vertical values into dozens of seperate tables help things?

Personally, I'd rather have a table with 3 columns and 8 million rows than a 
table with 642 columns and 100,000 rows.  Much easier to deal with.

And we are also assuming that Greg seldom needs to see all of the fields at 
once.   I'm pretty sure of this; if he did, he'd have run into the "wide row" 
bug in 7.3 and would be complaining about it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Re: [PERFORM] Compare rows

Reply via email to