> May be that's the opportunity for another question I have: given a foreign
> key (a,b) where "a" and "b" are more than a few bytes (not small) or are
> of variable size (still hopefully limited), are the values for "a" and "b"
> duplicated or do the foreign key creates a kind of references? (may be
> with an hash or even a short ID for the bigger value) If it's duplicated,
> then I will use integer keys instead. A bit long ago, I questioned the
> habit of mechanically using integers PK (and also FK), feeling using the
> literal values is more readable and simplifies queries' text. If my
> assumption is wrong (i.e. this is not by reference and there are copies
> every where), then I will have views for readable consultation and will
> bother less about more verbose queries.

An index will contain the actual data from the columns being indexed, plus the 
row number of the table to which the entry corresponds (or, for without rowid 
tables, the ENTIRE PRIMARY KEY column data to use to locate the underlying 
record).  This is why it is called a RELATIONAL DATABASE.  The entire thing 
operates only on the DATA and nothing but the data and does not contain 
"pointers".

Hierarchical, Network, and Network Extended database models use pointers in 
sets rather than duplicating the data.  This makes them orders of magnitude 
faster (when properly designed) than a Relational Model database, but means 
that there is no recovery possible where a pointer-chain becomes corrupted -- 
with the relational model everything has a copy (multiple duplicates) of the 
data so you just drop the corrupted thing and re-create it.

If a column contains a bunch of repeated data (say a phone book).  You can 
create the structures the following ways:

create table PhoneBookEntry
(
  surname text collate nocase not null,
  firstname text collate nocase not null,
  ...
  primary key (surname, firstname)
);

This will duplicate all the surnames and firstnames in the index.  It also 
means that you cannot have two people with the same name.  So maybe you add 
something else to the key, like the address.  This is then duplicated as well 
in the index and now your constraint is that you cannot have two people with 
the same names at the same address.

If you analyzed the problem, you would note that the Phone Number is the actual 
primary key, and the other data are merely attributes of the phone number.  For 
most efficient operations you would probably end up with something like this:

create table Surnames
(
 surname_id integer primary key,
 surname text not null collate nocase unique
);

create table GivenNames
(
 given_id integer primary key,
 givenname text not null collate nocase unique
);

create table Streets
(
  street_id integer primary key,
  streetname text not null collate nocase unique
);

create table Addresses
(
 address_id integer primary key,
 street_no integer,
 suffix text collate nocase,
 street_id integer not null references Streets
);
create unique index Address_Streetid on Addresses (street_id);
create index Address_Streetno on Addresses (Street_no, Suffix);

create table PhoneDirectory
(
 surname_id integer not null references Surnames,
 given_id integer not null references GivenNames,
 address_id integer not null references Addresses,
 PhoneNumber text collate nocase primary key
);
create index PD_Surnameid on PhoneDirectory (surname_id);
create index PD_Givenid on PhoneDirectory (given_id);
create index PD_addressid on PhoneDirectory (address_id);

create view v_PhoneDirectory
as
      select Surname, Givenname, Street_No, Suffix, Streetname, PhoneNumber
        from PhoneDirectory
natural join Addresses
natural join Streets
natural join GivenNames
natural join Surnames;

Then you use the view for all your queries of the PhoneDirectory.  This works 
very well if the optimizer properly prunes out any unnecessary joins caused by 
unreferenced columns in the view.  Even if not, all queries will run very 
quickly even with billions of records in the PhoneDirectory.  (NATURAL JOIN is 
syntactic sugar creating a WHERE clause that joins on like named columns).

This database would be in BCNF normal form.  (Although the streetno and suffix 
ought to be moved out to a separate table(s) if you need 4th or 5th normal).  
The model gets very much more complicated if you also have to handle municipal 
names, city names, etc.

Note that with this structure you can find all the "John" living on "Queen 
Street" in just nanoseconds.  Using it to produce a printed phone book, 
however, is not very efficient.  The structure is designed to optimize querying 
and eliminate update anomolies.  Since printed phone books are only produced 
once a year, the fact that this operation is not efficient is immaterial.




Reply via email to