Re: [SQL] surrogate key or not?

2004-07-23 Thread Karsten Hilbert
Josh, sad,

> create table diagnosis (
> pk serial primary key,
> fk_patient integer
> not null
> references patient(pk)
> on update cascade
> on delete cascade,
> narrative text
> not null,
> unique(fk_patient, narrative)
> );
> 
> This was obviously created so that a patient could have multiple diagnoses.   
> However, there is no information in the table to indicate *why* there are 
> multiple diagnoses.
Because there is no information to be had on this fact. The
patient IS afflicted by such or she is not. There is no why.

> And you are using a real key based on a long text 
> field;
Yes, but for various concerns I am not using it as the primary
key, just making sure it is unique. I was just trying to
ascertain myself that this is OK to do from a database insider
point of view.

> always hazardous, as there are many ways to phrase the same 
> information and duplication is likely.
But that is at the discreetion of the user/doctor and nothing
that can be enforced at the DB level (no, don't start thinking
about coding systems/classifications).

> To do it in english, your postulates look like:
> 
> PATIENT 67 was given a diagnosis of WATER ON THE KNEE.
> PATIENT 456 was given a diagnosis of ACUTE HYPOCHONDRIA.
Hm, I don't see anything wrong with that (I'm a doctor). The
plain information that Patient 456 is known to have suffered
bouts of ACUTE HYPOCHONDRIA is invaluable when dealing with an
agitated, psychically decompensated, hyperventilating patient
456.

> But this is a bit sketchy.   Who made these diagnoses?
I may or may not care. Our actual schema does, of course,
carry that information.

> When did they make them?
We'd be happy if we always knew.

>Why?
That's of marginal concern, actually, and the answer just
flows from the narrative of the medical record. But even if
there's no narrative there the "fact" alone helps.

> create table diagnosis (
> pk serial primary key,
> fk_patient integer  references patient(pk),
> fk_visit integer references visits(pk),
> fk_complaint integer references complaints(pk)
Nope, this doesn't belong here AT ALL from a medical point of
view. Diagnoses and complaints don't have any rational
relationship. This is life.

> fk_staff integer references medical_staff(pk)
> narrative text,
> unique(fk_patient, fk_visit, fk_complaint, fk_staff)
> );
And in fact our real tables ARE pretty much like that :-)

> PATIENT 67 was given a diagnosis by STAFF MEMBER 12 on his VISIT #3
> in response to NOT BEING ABLE TO WALK of WATER ON THE KNEE
> PATIENT 456 was given a diagnosis by STAFF MEMBER 19 on his VISIT #192
> in response to THE CREEPY-CRAWLIES of ACUTE HYPOCHONDRIA
That'd by a psychosis ;-)

> It also allows you to establish a much more useful key; it's reasonable to 
> expect that a single staff member on one visit in response to one complaint 
> would only give one diagnosis.
Entirely false and a possible sign of inappropriate care.

> Otherwise, you have more than database 
> problems.  And it prevents you from having to rely on a flaky long text key.
Flaky long text is what kept people reasonably well in health
for the last, what, five thousand years ? I rely on it
countless times every single day.

BTW, our full schema is here:

http://www.hherb.com/gnumed/schema/

Lot's of it isn't in the state yet where we want it but we are
getting there - or so I think.

Karsten Hilbert, MD, PhD
Leipzig, Germany
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] surrogate key or not?

2004-07-23 Thread Kenneth Gonsalves
On Friday 23 July 2004 12:27 pm, Karsten Hilbert wrote:

> BTW, our full schema is here:
>
> http://www.hherb.com/gnumed/schema/

i was looking at your schema. i'm not a database pro, but in master tables i 
see you have made the serial id as the primary key. i do it this way:

id  serial unique
namevarchar(25) not null
primary key is name - after all, you are going to search this on name arent 
you? or is there some advantage in doing it your way?

also, how did you get that neatly formatted output of the schema?
-- 
regards
kg

http://www.onlineindianhotels.net - hotel bookings reservations in over 4600 
hotels in India
http://www.ootygolfclub.org

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] surrogate key or not?

2004-07-23 Thread Michael Glaesemann
On Jul 23, 2004, at 4:57 PM, Kenneth Gonsalves wrote:
On Friday 23 July 2004 12:27 pm, Karsten Hilbert wrote:
BTW, our full schema is here:
http://www.hherb.com/gnumed/schema/
i was looking at your schema. i'm not a database pro, but in master 
tables i
see you have made the serial id as the primary key. i do it this way:

id	serial unique
name	varchar(25) not null
primary key is name - after all, you are going to search this on name 
arent
you? or is there some advantage in doing it your way?
Kenneth,
You appear to be misunderstanding the purpose of a primary key. A 
primary key is used to ensure there is a way to identify each row 
uniquely. It is quite independent of which columns you may or may not 
want to search on. If name is not going to be necessarily unique in the 
table, it isn't a primary key.

Michael Glaesemann
grzm myrealbox com
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [SQL] surrogate key or not?

2004-07-23 Thread Kenneth Gonsalves
On Friday 23 July 2004 03:29 pm, Michael Glaesemann wrote:

>
> You appear to be misunderstanding the purpose of a primary key. A
> primary key is used to ensure there is a way to identify each row
> uniquely. It is quite independent of which columns you may or may not
> want to search on. If name is not going to be necessarily unique in the
> table, it isn't a primary key.

ive not misunderstood anything. this is one of the tables in question:

address_type 
id serial PRIMARY KEY  
name text UNIQUE NOT NULL 

i think it is self explanatory 


-- 
regards
kg

http://www.onlineindianhotels.net - hotel bookings reservations in over 4600 
hotels in India
http://www.ootygolfclub.org

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] surrogate key or not?

2004-07-23 Thread Michael Glaesemann
On Jul 23, 2004, at 6:00 PM, Kenneth Gonsalves wrote:
On Friday 23 July 2004 03:29 pm, Michael Glaesemann wrote:
You appear to be misunderstanding the purpose of a primary key. A
primary key is used to ensure there is a way to identify each row
uniquely. It is quite independent of which columns you may or may not
want to search on. If name is not going to be necessarily unique in 
the
table, it isn't a primary key.
ive not misunderstood anything. this is one of the tables in question:
address_type
id serial PRIMARY KEY
name text UNIQUE NOT NULL
i think it is self explanatory

In the example you originally gave, there is no indication of name 
being a primary key:

On Friday 23 July 2004 12:27 pm, Karsten Hilbert wrote:
id	serial unique
name	varchar(25) not null
primary key is name - after all, you are going to search this on name 
arent
you? or is there some advantage in doing it your way?
Also, your explanation "after all, you are going to search..." did not 
mention row uniqueness at all. Sorry if this is not what you meant, but 
I can only go by what you've written.

Michael Glaesemann
grzm myrealbox com
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [SQL] surrogate key or not?

2004-07-23 Thread Kenneth Gonsalves
On Friday 23 July 2004 04:47 pm, Michael Glaesemann wrote:

> > id  serial unique
> > namevarchar(25) not null
> > primary key is name - after all, you are going to search this on name
> > arent
> > you? or is there some advantage in doing it your way?
>
> Also, your explanation "after all, you are going to search..." did not
> mention row uniqueness at all. Sorry if this is not what you meant, but
> I can only go by what you've written.

ok, i'll rephrase the whole thing:

i have a master table with two fields:
id serial unique
name varchar not null (and will be unique)

i always make 'name' the primary key, and since it is the primary key, i dont 
explicitly specify it as unique, and after postgres 7.3 came out, i have 
added the 'unique' constraint to the 'id'

on looking at the gnumed schema, i saw that although 'name' was unique, the 
serial key, 'id' was made the primary key. So i wondered why and whether 
there were advantages in doing it this way. 
-- 
regards
kg

http://www.onlineindianhotels.net - hotel bookings reservations in over 4600 
hotels in India
http://www.ootygolfclub.org

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [SQL] surrogate key or not?

2004-07-23 Thread Achilleus Mantzios

Regarding natural or surrogate keys...

It is often possible that a table definition does not depict
reality, meaning that the specification given at table design
phase was wrong, (or was later proved wrong).

I had a table "parts" like
\d parts
  Table "public.parts"
 Column  | Type  |   Modifiers
-+---+---
 partno  | character varying(20) | not null
 machtypeclsid   | integer   | not null
 partclsid   | integer   | not null
 picture | bytea |
 instructions| bytea |
.
Indexes:
"parts_ukey" primary key, btree (partno, machtypeclsid)
Foreign-key constraints:
"$2" FOREIGN KEY (partclsid) REFERENCES partclasses(id)
"$1" FOREIGN KEY (machtypeclsid) REFERENCES machtypesclasses(id)

Initially i was told, and found pretty natural, that a machine type
and a part number would fully identify a part.

Only to find out later, after a bunch of apps was written, that some 
makers, described, in the same machine type,
different parts (on differnt drawing numbers i.e. pages in maker's manual)
with the same part numbers.

The irony here, is that this was the only instance of natural primary keys
in my whole schema.

I had then to write a script to convert the table itself, as well as 
children tables to it, to the new schema.

Now the table looks like:

\d parts
  Table "public.parts"
 Column  | Type  |   Modifiers
-+---+---
 partno  | character varying(20) | not null
 machtypeclsid   | integer   | not null
 partclsid   | integer   | not null
 picture | bytea |
 instructions| bytea |
..
 id  | integer   | not null default 
nextval('public.parts_id_seq'::text)
 drgno   | character varying(20) |
Indexes:
"parts_pkey" primary key, btree (id)
"parts_ukey" unique, btree (drgno, partno, machtypeclsid)
Foreign-key constraints:
"$2" FOREIGN KEY (partclsid) REFERENCES partclasses(id)
"$1" FOREIGN KEY (machtypeclsid) REFERENCES machtypesclasses(id)

So, now if they decide, that apart from drgno (drawing number),
a new field "revision" must be added, that will be needed to identify 
"uniquely" the part, along with drgno,partno,machtypeclsid, 
it wouldn't mean a lot of changes.

In other words, with surrogate keys, you eliminate the chance
that your original design was flawed due to lack of important
initial knowledge.

A designer in the majority of cases, gets his input from people
of the subject being modeled.
Often these people fail to have the whole image described
when giving specs, hence all the trouble.

-- 
-Achilleus


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] surrogate key or not?

2004-07-23 Thread Janning Vygen
Am Freitag, 23. Juli 2004 12:02 schrieb Kenneth Gonsalves:
> ok, i'll rephrase the whole thing:
>
> i have a master table with two fields:
> id serial unique
> name varchar not null (and will be unique)
>
> i always make 'name' the primary key, and since it is the primary key, i
> dont explicitly specify it as unique, and after postgres 7.3 came out, i
> have added the 'unique' constraint to the 'id'
>
> on looking at the gnumed schema, i saw that although 'name' was unique, the
> serial key, 'id' was made the primary key. So i wondered why and whether
> there were advantages in doing it this way.

Does your question relates to surrogate vs natural keys discussion?

I made some researches a few months ago and read a lot including:
http://www.intelligententerprise.com/print_article_flat.jhtml?article=/030320/605celko1_1.jhtml
http://www.winnetmag.com/Articles/Print.cfm?ArticleID=5113
http://www.dbpd.com/vault/9805xtra.htm
http://www.sapior.com/Resources/Surrogate_Keys/surrogate_keys.html
http://www.bits-pilani.ac.in/dlp-home/online/studymat/sszg515/lecture3.htm
http://www.bcarter.com/intsurr1.htm

i decided to use natural keys wherever possible and i have many primary keys 
spanning up to 4 attributes. And it works really fine. Performance is great, 
the schema is easy to use and i am so glad to use the natural key approach.

Writing SQL queries and php code is much easier!

By now i try to avoid surrogate keys (like with SERIALs datatype) wherever 
possible. Most articles advocate surrogate keys and at first it looks like an 
advantage in a web environment because selecting and transmitting a 
multi-column primary key in a form field ist very difficult.

Imagine a  element, but you have only one value to be returned. My 
trick here is to have the primary keys used in the select element saved in a 
session array and using the session array index as a select element value.

But the strongest argument for me is: All candidate keys have to be unique 
anyway. And postgresql builds an index anyway for every UNIQUE key, because 
thats the way postgresql checks uniqueness. So why add another artifical key 
with another index when you can use the one which is given anyway.

Think of usergroups identified by name and members which are identified by 
user groups name and email adress, then  you've got the pseudo schema 

create table usergroups (
   ug_name text,
   CONSTRAINT uq_ug UNIQUE (ug_name)
);

create table members (
   ug_name text,
   mb_email text,
   CONSTRAINT uq_mb UNIQUE (ug_name, mb_email),
   CONSTRAINT fk_ug_name FOREIGN KEY ug_name REFERENCES usergoups (ug_name)
);

so you have to indexes uq_mb and uq_ug anyway. So why dont use them as Primary 
Keys?? With two more attribute for a surrogate key like

   ug_id SERIAL PRIMARY KEY

in table usergroups and

   mb_id SERIAL PRIMARY KEY

you have additional 4 bytes to store for each table row and one more index for 
each table.

So my conclusion is: i dont see any benefit in using surrogate keys. But this 
must be wrong because so many people are using and advocating surrogate keys. 
They might only be useful in circumstances where no natural key is given.

kind regards,
janning



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [SQL] surrogate key or not?

2004-07-23 Thread Tom Lane
Michael Glaesemann <[EMAIL PROTECTED]> writes:
> You appear to be misunderstanding the purpose of a primary key. A 
> primary key is used to ensure there is a way to identify each row 
> uniquely. It is quite independent of which columns you may or may not 
> want to search on. If name is not going to be necessarily unique in the 
> table, it isn't a primary key.

The other standard reason for using a made-up value as primary key is
that it's under your control and you can guarantee it isn't going to
change: one record will have the same primary key for its entire life,
which vastly simplifies handling of foreign keys and so forth.

If you use any real-world datum as primary key you necessarily buy into
tracking changes in that real-world value.  And handling duplicates.
Names are certainly not unique, and they do change (marriage, etc) and
people do miskey them and then expect to be able to fix the error later.

As Achilleus' nearby story shows, you can have these problems (certainly
the misentry part) even with imported data that is allegedly someone
else's primary key; part numbers, USA social-security numbers, etc.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [SQL] surrogate key or not?

2004-07-23 Thread Josh Berkus
Achilleus,

> In other words, with surrogate keys, you eliminate the chance
> that your original design was flawed due to lack of important
> initial knowledge.

Well, you don't *eliminate* it, but you do decrease it.   

I'd say, yes, this is an important 4th reason:   

4) Your spec may be incorrect and surrogate keys make it easier to make design 
changes in production.

Once again, though, this is an *implementation* issue and not a *logic* issue, 
as I asserted ...

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [SQL] surrogate key or not?

2004-07-23 Thread Josh Berkus
Karsten,

> Because there is no information to be had on this fact. The
> patient IS afflicted by such or she is not. There is no why.

I begin to see why I spent $2000 this spring to have a doctor tell me what I 
*didn't* have ... 

> http://www.hherb.com/gnumed/schema/
> 
> Lot's of it isn't in the state yet where we want it but we are
> getting there - or so I think.

When I have time, sure!   But, this afternoon I am off to OSCON so I won't 
have a chance for 2 weeks at least.  Drop me a personal e-mail in August so I 
don't forget.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly