Re: [sqlite] Re: db design mixing different entities in the same table

2007-05-31 Thread Andrew Finkenstadt

On 6/1/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote:


On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote:
>
> On 5/31/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote:
> > On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote:
> > >
> > > I realize that I didn't explain the problem well enough (thus is the
> > > peril of being too close to it for a long time).
> > >
> > > I can have an (org <--> org) relationship or an (org <--> person)
> > > relationship as well as a (person <--> person) relationship. To make
> > > matters worse, a person or an org can have multiple relationships
> > > based a role. Here is an example.
> > >
> > > John is the Director of Forestry Dept.
> > > Jane is the Head of the Economics Dept.
> > > Joe is a Scientist at the Forestry Dept.
> > >
> > > rel1: John (Director: Forestry) has a joint project with Jane (Head:
>
> > > Economics)
> > > rel2: John (Director: Forestry) is the boss of Joe (Scientist:
> Forestry)
> > > rel3: John (Person: null) is a friend of Joe (Person: null)
> > > rel4: John (Person: null) goes to the same church as Jane (Person:
> null)
> > >
> > > and so on.
> > >
> > > Well, I really have three "vectors" per "node" -- person, role,
> > > organization. In each node, either person or organization can be
> null
> > > (obviously, role will never be null because every node or entity has
> a
> > > purpose). Each of these vectors can be defined by a finite number of
> > > meaningful properties -- for example, a person has a name, nickname,
>
> > > address, contact information, and so on. Even the role can have a
> > > level of importance -- a "secretary" at the Forestry department is
> not
> > > as important as the "Director," so the role can have a weight, and
> so
> > > on.
> > >
> > > This is the problem that I have to model. I am trying to devise a
> > > storage structure for something like this in SQLite.
> > >
> > >
> >
> >
> > Been there, done that.  About 15 years ago.
>
> oh, how I hate you ;-)
>
> > We called the model CAPPR:  Company, Address, Person, Phone, and
> Roles.
> ..
> >
> > So bottom line, you're going to need your ENTITY tables, and your
> > RELATIONSHIP tables, and your ROLE tables.  You may also need "lookup
> > tables" to help enforce data cleanliness by reducing spelling errors
> or
> > similar issues.
> ..
>
> So, you are telling me that you did it, but I really want to know
> "how" to do it. I am sure this is not a unique problem I am trying to
> solve. I just am getting all knotted up trying to figure out a sane db
> structure to model this within so I can put data in and take data out
> without major contortions and yet have it be flexible and scalable for
> different kinds of relationships.
>
> So, the real CAPPR will be if you give me a few db design guidelines
> here.
>
> ;-)
>
> --
> Puneet Kishor http://punkish.eidesis.org/
> Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
> Open Source Geospatial Foundation http://www.osgeo.org/education/
>


The database was rather simple in pictures, but complex by the time you
added everything in to help the database help your program stay out of
trouble.

We chose to identify a person by various sub-fields, and since we were not
internationalization-aware, we didn't have to have our innocence sullied by
the different positions of family names, given names, surnames, and
alternate names based on who you were in relationship to.  (Japanese and
asian language names, germanic, russian, spanish, etc.)  So our sub-fields
were honorific (Sir, Don, The Honorable), title (Mr. Mrs. Dr. Reverend,
etc), first name, middle name, last name, and something we could never
decide on a name for:  Post-name (the III, Junior, etc), and the degreed
position ( Ph.D., M.S.C.A, etc).

Data entry of this name could be by-field (preferred) or free text, which
the software then parsed into the separate fields for correction, and easy
searching later.  A little bit of human effort here helped increase the
utility of their contact management activities later.

We identified address the same way the post office does: street number,
pre-direction, street name, street type, post-direction, suite/apartment
qualififier, suite/apartment number, city, state, zip, zip+4.  The software
used address correction software certified for bulk mailing purposes with
the USPS.

Company name was almost the simplest.  There was a freetext data entry
field, and then the actual database field.  Abbreviations in the freetext
were expanded to the full name in the database field.

Phone numbers had country codes, area codes, telephone numbers,
extensions, passcodes, and special dialing instructions.

Roles were multi-level.  Roles were identified by a unique ID number, and
then split into appropriateness (which relationship it could describe
[person-company, company-address, person-address, person-phone,
company-phone, address-phone), the directionality description (left to
right, right to left), and then into categories.

Next, 

Re: [sqlite] Re: db design mixing different entities in the same table

2007-05-31 Thread Andrew Finkenstadt

On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote:


On 5/31/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote:
> On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote:
> >
> > I realize that I didn't explain the problem well enough (thus is the
> > peril of being too close to it for a long time).
> >
> > I can have an (org <--> org) relationship or an (org <--> person)
> > relationship as well as a (person <--> person) relationship. To make
> > matters worse, a person or an org can have multiple relationships
> > based a role. Here is an example.
> >
> > John is the Director of Forestry Dept.
> > Jane is the Head of the Economics Dept.
> > Joe is a Scientist at the Forestry Dept.
> >
> > rel1: John (Director: Forestry) has a joint project with Jane (Head:
> > Economics)
> > rel2: John (Director: Forestry) is the boss of Joe (Scientist:
Forestry)
> > rel3: John (Person: null) is a friend of Joe (Person: null)
> > rel4: John (Person: null) goes to the same church as Jane (Person:
null)
> >
> > and so on.
> >
> > Well, I really have three "vectors" per "node" -- person, role,
> > organization. In each node, either person or organization can be null
> > (obviously, role will never be null because every node or entity has a
> > purpose). Each of these vectors can be defined by a finite number of
> > meaningful properties -- for example, a person has a name, nickname,
> > address, contact information, and so on. Even the role can have a
> > level of importance -- a "secretary" at the Forestry department is not
> > as important as the "Director," so the role can have a weight, and so
> > on.
> >
> > This is the problem that I have to model. I am trying to devise a
> > storage structure for something like this in SQLite.
> >
> >
>
>
> Been there, done that.  About 15 years ago.

oh, how I hate you ;-)

> We called the model CAPPR:  Company, Address, Person, Phone, and Roles.
..
>
> So bottom line, you're going to need your ENTITY tables, and your
> RELATIONSHIP tables, and your ROLE tables.  You may also need "lookup
> tables" to help enforce data cleanliness by reducing spelling errors or
> similar issues.
..

So, you are telling me that you did it, but I really want to know
"how" to do it. I am sure this is not a unique problem I am trying to
solve. I just am getting all knotted up trying to figure out a sane db
structure to model this within so I can put data in and take data out
without major contortions and yet have it be flexible and scalable for
different kinds of relationships.

So, the real CAPPR will be if you give me a few db design guidelines here.

;-)

--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation http://www.osgeo.org/education/




The database was rather simple in pictures, but complex by the time you
added everything in to help the database help your program stay out of
trouble.

We chose to identify a person by various sub-fields, and since we were not
internationalization-aware, we didn't have to have our innocence sullied by
the different positions of family names, given names, surnames, and
alternate names based on who you were in relationship to.  (Japanese and
asian language names, germanic, russian, spanish, etc.)  So our sub-fields
were honorific (Sir, Don, The Honorable), title (Mr. Mrs. Dr. Reverend,
etc), first name, middle name, last name, and something we could never
decide on a name for:  Post-name (the III, Junior, etc), and the degreed
position (Ph.D., M.S.C.A, etc).

Data entry of this name could be by-field (preferred) or free text, which
the software then parsed into the separate fields for correction, and easy
searching later.  A little bit of human effort here helped increase the
utility of their contact management activities later.

We identified address the same way the post office does: street number,
pre-direction, street name, street type, post-direction, suite/apartment
qualififier, suite/apartment number, city, state, zip, zip+4.  The software
used address correction software certified for bulk mailing purposes with
the USPS.

Company name was almost the simplest.  There was a freetext data entry
field, and then the actual database field.  Abbreviations in the freetext
were expanded to the full name in the database field.

Phone numbers had country codes, area codes, telephone numbers, extensions,
passcodes, and special dialing instructions.

Roles were multi-level.  Roles were identified by a unique ID number, and
then split into appropriateness (which relationship it could describe
[person-company, company-address, person-address, person-phone,
company-phone, address-phone), the directionality description (left to
right, right to left), and then into categories.

Next, since there were 4 entity tables, there were 6 relationship tables
whose foreign keys were the corresponding entries in the entity tables & the
role table, plus a free-text field for notes or comments.

Each 

Re: [sqlite] Re: db design mixing different entities in the same table

2007-05-31 Thread P Kishor

On 5/31/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote:

On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote:
>
> I realize that I didn't explain the problem well enough (thus is the
> peril of being too close to it for a long time).
>
> I can have an (org <--> org) relationship or an (org <--> person)
> relationship as well as a (person <--> person) relationship. To make
> matters worse, a person or an org can have multiple relationships
> based a role. Here is an example.
>
> John is the Director of Forestry Dept.
> Jane is the Head of the Economics Dept.
> Joe is a Scientist at the Forestry Dept.
>
> rel1: John (Director: Forestry) has a joint project with Jane (Head:
> Economics)
> rel2: John (Director: Forestry) is the boss of Joe (Scientist: Forestry)
> rel3: John (Person: null) is a friend of Joe (Person: null)
> rel4: John (Person: null) goes to the same church as Jane (Person: null)
>
> and so on.
>
> Well, I really have three "vectors" per "node" -- person, role,
> organization. In each node, either person or organization can be null
> (obviously, role will never be null because every node or entity has a
> purpose). Each of these vectors can be defined by a finite number of
> meaningful properties -- for example, a person has a name, nickname,
> address, contact information, and so on. Even the role can have a
> level of importance -- a "secretary" at the Forestry department is not
> as important as the "Director," so the role can have a weight, and so
> on.
>
> This is the problem that I have to model. I am trying to devise a
> storage structure for something like this in SQLite.
>
>


Been there, done that.  About 15 years ago.


oh, how I hate you ;-)


We called the model CAPPR:  Company, Address, Person, Phone, and Roles.

..


So bottom line, you're going to need your ENTITY tables, and your
RELATIONSHIP tables, and your ROLE tables.  You may also need "lookup
tables" to help enforce data cleanliness by reducing spelling errors or
similar issues.

..

So, you are telling me that you did it, but I really want to know
"how" to do it. I am sure this is not a unique problem I am trying to
solve. I just am getting all knotted up trying to figure out a sane db
structure to model this within so I can put data in and take data out
without major contortions and yet have it be flexible and scalable for
different kinds of relationships.

So, the real CAPPR will be if you give me a few db design guidelines here.

;-)

--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation http://www.osgeo.org/education/
-
collaborate, communicate, compete
=

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: db design mixing different entities in the same table

2007-05-31 Thread Andrew Finkenstadt

On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote:


I realize that I didn't explain the problem well enough (thus is the
peril of being too close to it for a long time).

I can have an (org <--> org) relationship or an (org <--> person)
relationship as well as a (person <--> person) relationship. To make
matters worse, a person or an org can have multiple relationships
based a role. Here is an example.

John is the Director of Forestry Dept.
Jane is the Head of the Economics Dept.
Joe is a Scientist at the Forestry Dept.

rel1: John (Director: Forestry) has a joint project with Jane (Head:
Economics)
rel2: John (Director: Forestry) is the boss of Joe (Scientist: Forestry)
rel3: John (Person: null) is a friend of Joe (Person: null)
rel4: John (Person: null) goes to the same church as Jane (Person: null)

and so on.

Well, I really have three "vectors" per "node" -- person, role,
organization. In each node, either person or organization can be null
(obviously, role will never be null because every node or entity has a
purpose). Each of these vectors can be defined by a finite number of
meaningful properties -- for example, a person has a name, nickname,
address, contact information, and so on. Even the role can have a
level of importance -- a "secretary" at the Forestry department is not
as important as the "Director," so the role can have a weight, and so
on.

This is the problem that I have to model. I am trying to devise a
storage structure for something like this in SQLite.





Been there, done that.  About 15 years ago.

At the time I worked for Homes and Land Publishing in Tallahassee, and we
built customizable software for the associate publishers who owned the
magazine franchises in various areas.  This software was used in their
offices to manage their contacts with brokerages, advertisers, and so on.

We called the model CAPPR:  Company, Address, Person, Phone, and Roles.  And
it was a many-to-many network between each of those entity types:  COMPANY
had zero or more ADDRESSES.  PERSON had zero or more ADDRESSES.  COMPANY (or
PERSON) had zero or more PHONE numbers, each with a different purpose.
COMPANY had zero or more PERSONS known to be associated with them, each with
one or more ROLES.

It worked very, very well, as it modeled the real world accurately.

One of the interesting things it let them determine was who gave out
non-unique addresses for different companies, helping them to reduce fraud
and to "know your customer" better.  (This was before the advent of
well-known databases of private mailbox companies like MAILBOXES ETC.)

So bottom line, you're going to need your ENTITY tables, and your
RELATIONSHIP tables, and your ROLE tables.  You may also need "lookup
tables" to help enforce data cleanliness by reducing spelling errors or
similar issues.

In our case we built our model on Oracle RDBMS with full referential
integrity enforced, which prevented a boatload of common programming errors
from messing up the relationships between persons, companies, addresses, and
phone numbers by deleting records that were still in use.  I don't recall if
SQLite enforces referential integrity, or merely accepts  the syntax against
a future update that will.

--andy


Re: [sqlite] Re: db design mixing different entities in the same table

2007-05-31 Thread P Kishor

On 5/31/07, Igor Tandetnik <[EMAIL PROTECTED]> wrote:

P Kishor <[EMAIL PROTECTED]> wrote:
> I am modeling entities and their relationships to each other. A
> classic network digraph kind of stuff. The entities are organizations
> or persons. A person may  be associated with none (org type
> "unassigned") or one organization, and, of course, an organization may
> have none or more persons in it.
>
> I started with
>
> CREATE TABLE orgs (
>  org_id INTEGER PRIMARY KEY,
>  .. bunch of org stuff ..
> );
>
> CREATE TABLE persons (
>  person_id INTEGER PRIMARY KEY,
>  org_idINTEGER   -- FK to orgs(org_id)
>  .. bunch of person stuff ..
> );
>
> CREATE TABLE relationships (
>  rel_id   INTEGER PRIMARY KEY,
>  from_id  INTEGER, -- person_id or org_id
>  to_idINTEGER, -- person_id or org_id
>  entity_type  INTEGER DEFAULT 0,   -- 0 is "org", 1 is "person"
>  relationship TEXT
> );

Why do you feel you need this last table at all? All the information is
already available from persons.org_id.



I realize that I didn't explain the problem well enough (thus is the
peril of being too close to it for a long time).

I can have an (org <--> org) relationship or an (org <--> person)
relationship as well as a (person <--> person) relationship. To make
matters worse, a person or an org can have multiple relationships
based a role. Here is an example.

John is the Director of Forestry Dept.
Jane is the Head of the Economics Dept.
Joe is a Scientist at the Forestry Dept.

rel1: John (Director: Forestry) has a joint project with Jane (Head: Economics)
rel2: John (Director: Forestry) is the boss of Joe (Scientist: Forestry)
rel3: John (Person: null) is a friend of Joe (Person: null)
rel4: John (Person: null) goes to the same church as Jane (Person: null)

and so on.

Well, I really have three "vectors" per "node" -- person, role,
organization. In each node, either person or organization can be null
(obviously, role will never be null because every node or entity has a
purpose). Each of these vectors can be defined by a finite number of
meaningful properties -- for example, a person has a name, nickname,
address, contact information, and so on. Even the role can have a
level of importance -- a "secretary" at the Forestry department is not
as important as the "Director," so the role can have a weight, and so
on.

This is the problem that I have to model. I am trying to devise a
storage structure for something like this in SQLite.


--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation http://www.osgeo.org/education/
-
collaborate, communicate, compete
=

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Re: db design mixing different entities in the same table

2007-05-31 Thread Igor Tandetnik

P Kishor <[EMAIL PROTECTED]> wrote:

I am modeling entities and their relationships to each other. A
classic network digraph kind of stuff. The entities are organizations
or persons. A person may  be associated with none (org type
"unassigned") or one organization, and, of course, an organization may
have none or more persons in it.

I started with

CREATE TABLE orgs (
 org_id INTEGER PRIMARY KEY,
 .. bunch of org stuff ..
);

CREATE TABLE persons (
 person_id INTEGER PRIMARY KEY,
 org_idINTEGER   -- FK to orgs(org_id)
 .. bunch of person stuff ..
);

CREATE TABLE relationships (
 rel_id   INTEGER PRIMARY KEY,
 from_id  INTEGER, -- person_id or org_id
 to_idINTEGER, -- person_id or org_id
 entity_type  INTEGER DEFAULT 0,   -- 0 is "org", 1 is "person"
 relationship TEXT
);


Why do you feel you need this last table at all? All the information is 
already available from persons.org_id.


Igor Tandetnik 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-