> May be that's the opportunity for another question I have: given a foreign > key (a,b) where "a" and "b" are more than a few bytes (not small) or are > of variable size (still hopefully limited), are the values for "a" and "b" > duplicated or do the foreign key creates a kind of references? (may be > with an hash or even a short ID for the bigger value) If it's duplicated, > then I will use integer keys instead. A bit long ago, I questioned the > habit of mechanically using integers PK (and also FK), feeling using the > literal values is more readable and simplifies queries' text. If my > assumption is wrong (i.e. this is not by reference and there are copies > every where), then I will have views for readable consultation and will > bother less about more verbose queries.
An index will contain the actual data from the columns being indexed, plus the row number of the table to which the entry corresponds (or, for without rowid tables, the ENTIRE PRIMARY KEY column data to use to locate the underlying record). This is why it is called a RELATIONAL DATABASE. The entire thing operates only on the DATA and nothing but the data and does not contain "pointers". Hierarchical, Network, and Network Extended database models use pointers in sets rather than duplicating the data. This makes them orders of magnitude faster (when properly designed) than a Relational Model database, but means that there is no recovery possible where a pointer-chain becomes corrupted -- with the relational model everything has a copy (multiple duplicates) of the data so you just drop the corrupted thing and re-create it. If a column contains a bunch of repeated data (say a phone book). You can create the structures the following ways: create table PhoneBookEntry ( surname text collate nocase not null, firstname text collate nocase not null, ... primary key (surname, firstname) ); This will duplicate all the surnames and firstnames in the index. It also means that you cannot have two people with the same name. So maybe you add something else to the key, like the address. This is then duplicated as well in the index and now your constraint is that you cannot have two people with the same names at the same address. If you analyzed the problem, you would note that the Phone Number is the actual primary key, and the other data are merely attributes of the phone number. For most efficient operations you would probably end up with something like this: create table Surnames ( surname_id integer primary key, surname text not null collate nocase unique ); create table GivenNames ( given_id integer primary key, givenname text not null collate nocase unique ); create table Streets ( street_id integer primary key, streetname text not null collate nocase unique ); create table Addresses ( address_id integer primary key, street_no integer, suffix text collate nocase, street_id integer not null references Streets ); create unique index Address_Streetid on Addresses (street_id); create index Address_Streetno on Addresses (Street_no, Suffix); create table PhoneDirectory ( surname_id integer not null references Surnames, given_id integer not null references GivenNames, address_id integer not null references Addresses, PhoneNumber text collate nocase primary key ); create index PD_Surnameid on PhoneDirectory (surname_id); create index PD_Givenid on PhoneDirectory (given_id); create index PD_addressid on PhoneDirectory (address_id); create view v_PhoneDirectory as select Surname, Givenname, Street_No, Suffix, Streetname, PhoneNumber from PhoneDirectory natural join Addresses natural join Streets natural join GivenNames natural join Surnames; Then you use the view for all your queries of the PhoneDirectory. This works very well if the optimizer properly prunes out any unnecessary joins caused by unreferenced columns in the view. Even if not, all queries will run very quickly even with billions of records in the PhoneDirectory. (NATURAL JOIN is syntactic sugar creating a WHERE clause that joins on like named columns). This database would be in BCNF normal form. (Although the streetno and suffix ought to be moved out to a separate table(s) if you need 4th or 5th normal). The model gets very much more complicated if you also have to handle municipal names, city names, etc. Note that with this structure you can find all the "John" living on "Queen Street" in just nanoseconds. Using it to produce a printed phone book, however, is not very efficient. The structure is designed to optimize querying and eliminate update anomolies. Since printed phone books are only produced once a year, the fact that this operation is not efficient is immaterial.