Re: [sqlite] Re: db design mixing different entities in the same table
On 6/1/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote: > > On 5/31/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: > > On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote: > > > > > > I realize that I didn't explain the problem well enough (thus is the > > > peril of being too close to it for a long time). > > > > > > I can have an (org <--> org) relationship or an (org <--> person) > > > relationship as well as a (person <--> person) relationship. To make > > > matters worse, a person or an org can have multiple relationships > > > based a role. Here is an example. > > > > > > John is the Director of Forestry Dept. > > > Jane is the Head of the Economics Dept. > > > Joe is a Scientist at the Forestry Dept. > > > > > > rel1: John (Director: Forestry) has a joint project with Jane (Head: > > > > Economics) > > > rel2: John (Director: Forestry) is the boss of Joe (Scientist: > Forestry) > > > rel3: John (Person: null) is a friend of Joe (Person: null) > > > rel4: John (Person: null) goes to the same church as Jane (Person: > null) > > > > > > and so on. > > > > > > Well, I really have three "vectors" per "node" -- person, role, > > > organization. In each node, either person or organization can be > null > > > (obviously, role will never be null because every node or entity has > a > > > purpose). Each of these vectors can be defined by a finite number of > > > meaningful properties -- for example, a person has a name, nickname, > > > > address, contact information, and so on. Even the role can have a > > > level of importance -- a "secretary" at the Forestry department is > not > > > as important as the "Director," so the role can have a weight, and > so > > > on. > > > > > > This is the problem that I have to model. I am trying to devise a > > > storage structure for something like this in SQLite. > > > > > > > > > > > > Been there, done that. About 15 years ago. > > oh, how I hate you ;-) > > > We called the model CAPPR: Company, Address, Person, Phone, and > Roles. > .. > > > > So bottom line, you're going to need your ENTITY tables, and your > > RELATIONSHIP tables, and your ROLE tables. You may also need "lookup > > tables" to help enforce data cleanliness by reducing spelling errors > or > > similar issues. > .. > > So, you are telling me that you did it, but I really want to know > "how" to do it. I am sure this is not a unique problem I am trying to > solve. I just am getting all knotted up trying to figure out a sane db > structure to model this within so I can put data in and take data out > without major contortions and yet have it be flexible and scalable for > different kinds of relationships. > > So, the real CAPPR will be if you give me a few db design guidelines > here. > > ;-) > > -- > Puneet Kishor http://punkish.eidesis.org/ > Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/ > Open Source Geospatial Foundation http://www.osgeo.org/education/ > The database was rather simple in pictures, but complex by the time you added everything in to help the database help your program stay out of trouble. We chose to identify a person by various sub-fields, and since we were not internationalization-aware, we didn't have to have our innocence sullied by the different positions of family names, given names, surnames, and alternate names based on who you were in relationship to. (Japanese and asian language names, germanic, russian, spanish, etc.) So our sub-fields were honorific (Sir, Don, The Honorable), title (Mr. Mrs. Dr. Reverend, etc), first name, middle name, last name, and something we could never decide on a name for: Post-name (the III, Junior, etc), and the degreed position ( Ph.D., M.S.C.A, etc). Data entry of this name could be by-field (preferred) or free text, which the software then parsed into the separate fields for correction, and easy searching later. A little bit of human effort here helped increase the utility of their contact management activities later. We identified address the same way the post office does: street number, pre-direction, street name, street type, post-direction, suite/apartment qualififier, suite/apartment number, city, state, zip, zip+4. The software used address correction software certified for bulk mailing purposes with the USPS. Company name was almost the simplest. There was a freetext data entry field, and then the actual database field. Abbreviations in the freetext were expanded to the full name in the database field. Phone numbers had country codes, area codes, telephone numbers, extensions, passcodes, and special dialing instructions. Roles were multi-level. Roles were identified by a unique ID number, and then split into appropriateness (which relationship it could describe [person-company, company-address, person-address, person-phone, company-phone, address-phone), the directionality description (left to right, right to left), and then into categories. Next,
Re: [sqlite] Re: db design mixing different entities in the same table
On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote: On 5/31/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: > On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote: > > > > I realize that I didn't explain the problem well enough (thus is the > > peril of being too close to it for a long time). > > > > I can have an (org <--> org) relationship or an (org <--> person) > > relationship as well as a (person <--> person) relationship. To make > > matters worse, a person or an org can have multiple relationships > > based a role. Here is an example. > > > > John is the Director of Forestry Dept. > > Jane is the Head of the Economics Dept. > > Joe is a Scientist at the Forestry Dept. > > > > rel1: John (Director: Forestry) has a joint project with Jane (Head: > > Economics) > > rel2: John (Director: Forestry) is the boss of Joe (Scientist: Forestry) > > rel3: John (Person: null) is a friend of Joe (Person: null) > > rel4: John (Person: null) goes to the same church as Jane (Person: null) > > > > and so on. > > > > Well, I really have three "vectors" per "node" -- person, role, > > organization. In each node, either person or organization can be null > > (obviously, role will never be null because every node or entity has a > > purpose). Each of these vectors can be defined by a finite number of > > meaningful properties -- for example, a person has a name, nickname, > > address, contact information, and so on. Even the role can have a > > level of importance -- a "secretary" at the Forestry department is not > > as important as the "Director," so the role can have a weight, and so > > on. > > > > This is the problem that I have to model. I am trying to devise a > > storage structure for something like this in SQLite. > > > > > > > Been there, done that. About 15 years ago. oh, how I hate you ;-) > We called the model CAPPR: Company, Address, Person, Phone, and Roles. .. > > So bottom line, you're going to need your ENTITY tables, and your > RELATIONSHIP tables, and your ROLE tables. You may also need "lookup > tables" to help enforce data cleanliness by reducing spelling errors or > similar issues. .. So, you are telling me that you did it, but I really want to know "how" to do it. I am sure this is not a unique problem I am trying to solve. I just am getting all knotted up trying to figure out a sane db structure to model this within so I can put data in and take data out without major contortions and yet have it be flexible and scalable for different kinds of relationships. So, the real CAPPR will be if you give me a few db design guidelines here. ;-) -- Puneet Kishor http://punkish.eidesis.org/ Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/ Open Source Geospatial Foundation http://www.osgeo.org/education/ The database was rather simple in pictures, but complex by the time you added everything in to help the database help your program stay out of trouble. We chose to identify a person by various sub-fields, and since we were not internationalization-aware, we didn't have to have our innocence sullied by the different positions of family names, given names, surnames, and alternate names based on who you were in relationship to. (Japanese and asian language names, germanic, russian, spanish, etc.) So our sub-fields were honorific (Sir, Don, The Honorable), title (Mr. Mrs. Dr. Reverend, etc), first name, middle name, last name, and something we could never decide on a name for: Post-name (the III, Junior, etc), and the degreed position (Ph.D., M.S.C.A, etc). Data entry of this name could be by-field (preferred) or free text, which the software then parsed into the separate fields for correction, and easy searching later. A little bit of human effort here helped increase the utility of their contact management activities later. We identified address the same way the post office does: street number, pre-direction, street name, street type, post-direction, suite/apartment qualififier, suite/apartment number, city, state, zip, zip+4. The software used address correction software certified for bulk mailing purposes with the USPS. Company name was almost the simplest. There was a freetext data entry field, and then the actual database field. Abbreviations in the freetext were expanded to the full name in the database field. Phone numbers had country codes, area codes, telephone numbers, extensions, passcodes, and special dialing instructions. Roles were multi-level. Roles were identified by a unique ID number, and then split into appropriateness (which relationship it could describe [person-company, company-address, person-address, person-phone, company-phone, address-phone), the directionality description (left to right, right to left), and then into categories. Next, since there were 4 entity tables, there were 6 relationship tables whose foreign keys were the corresponding entries in the entity tables & the role table, plus a free-text field for notes or comments. Each
Re: [sqlite] Re: db design mixing different entities in the same table
On 5/31/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote: > > I realize that I didn't explain the problem well enough (thus is the > peril of being too close to it for a long time). > > I can have an (org <--> org) relationship or an (org <--> person) > relationship as well as a (person <--> person) relationship. To make > matters worse, a person or an org can have multiple relationships > based a role. Here is an example. > > John is the Director of Forestry Dept. > Jane is the Head of the Economics Dept. > Joe is a Scientist at the Forestry Dept. > > rel1: John (Director: Forestry) has a joint project with Jane (Head: > Economics) > rel2: John (Director: Forestry) is the boss of Joe (Scientist: Forestry) > rel3: John (Person: null) is a friend of Joe (Person: null) > rel4: John (Person: null) goes to the same church as Jane (Person: null) > > and so on. > > Well, I really have three "vectors" per "node" -- person, role, > organization. In each node, either person or organization can be null > (obviously, role will never be null because every node or entity has a > purpose). Each of these vectors can be defined by a finite number of > meaningful properties -- for example, a person has a name, nickname, > address, contact information, and so on. Even the role can have a > level of importance -- a "secretary" at the Forestry department is not > as important as the "Director," so the role can have a weight, and so > on. > > This is the problem that I have to model. I am trying to devise a > storage structure for something like this in SQLite. > > Been there, done that. About 15 years ago. oh, how I hate you ;-) We called the model CAPPR: Company, Address, Person, Phone, and Roles. .. So bottom line, you're going to need your ENTITY tables, and your RELATIONSHIP tables, and your ROLE tables. You may also need "lookup tables" to help enforce data cleanliness by reducing spelling errors or similar issues. .. So, you are telling me that you did it, but I really want to know "how" to do it. I am sure this is not a unique problem I am trying to solve. I just am getting all knotted up trying to figure out a sane db structure to model this within so I can put data in and take data out without major contortions and yet have it be flexible and scalable for different kinds of relationships. So, the real CAPPR will be if you give me a few db design guidelines here. ;-) -- Puneet Kishor http://punkish.eidesis.org/ Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/ Open Source Geospatial Foundation http://www.osgeo.org/education/ - collaborate, communicate, compete = - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: db design mixing different entities in the same table
On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote: I realize that I didn't explain the problem well enough (thus is the peril of being too close to it for a long time). I can have an (org <--> org) relationship or an (org <--> person) relationship as well as a (person <--> person) relationship. To make matters worse, a person or an org can have multiple relationships based a role. Here is an example. John is the Director of Forestry Dept. Jane is the Head of the Economics Dept. Joe is a Scientist at the Forestry Dept. rel1: John (Director: Forestry) has a joint project with Jane (Head: Economics) rel2: John (Director: Forestry) is the boss of Joe (Scientist: Forestry) rel3: John (Person: null) is a friend of Joe (Person: null) rel4: John (Person: null) goes to the same church as Jane (Person: null) and so on. Well, I really have three "vectors" per "node" -- person, role, organization. In each node, either person or organization can be null (obviously, role will never be null because every node or entity has a purpose). Each of these vectors can be defined by a finite number of meaningful properties -- for example, a person has a name, nickname, address, contact information, and so on. Even the role can have a level of importance -- a "secretary" at the Forestry department is not as important as the "Director," so the role can have a weight, and so on. This is the problem that I have to model. I am trying to devise a storage structure for something like this in SQLite. Been there, done that. About 15 years ago. At the time I worked for Homes and Land Publishing in Tallahassee, and we built customizable software for the associate publishers who owned the magazine franchises in various areas. This software was used in their offices to manage their contacts with brokerages, advertisers, and so on. We called the model CAPPR: Company, Address, Person, Phone, and Roles. And it was a many-to-many network between each of those entity types: COMPANY had zero or more ADDRESSES. PERSON had zero or more ADDRESSES. COMPANY (or PERSON) had zero or more PHONE numbers, each with a different purpose. COMPANY had zero or more PERSONS known to be associated with them, each with one or more ROLES. It worked very, very well, as it modeled the real world accurately. One of the interesting things it let them determine was who gave out non-unique addresses for different companies, helping them to reduce fraud and to "know your customer" better. (This was before the advent of well-known databases of private mailbox companies like MAILBOXES ETC.) So bottom line, you're going to need your ENTITY tables, and your RELATIONSHIP tables, and your ROLE tables. You may also need "lookup tables" to help enforce data cleanliness by reducing spelling errors or similar issues. In our case we built our model on Oracle RDBMS with full referential integrity enforced, which prevented a boatload of common programming errors from messing up the relationships between persons, companies, addresses, and phone numbers by deleting records that were still in use. I don't recall if SQLite enforces referential integrity, or merely accepts the syntax against a future update that will. --andy
Re: [sqlite] Re: db design mixing different entities in the same table
On 5/31/07, Igor Tandetnik <[EMAIL PROTECTED]> wrote: P Kishor <[EMAIL PROTECTED]> wrote: > I am modeling entities and their relationships to each other. A > classic network digraph kind of stuff. The entities are organizations > or persons. A person may be associated with none (org type > "unassigned") or one organization, and, of course, an organization may > have none or more persons in it. > > I started with > > CREATE TABLE orgs ( > org_id INTEGER PRIMARY KEY, > .. bunch of org stuff .. > ); > > CREATE TABLE persons ( > person_id INTEGER PRIMARY KEY, > org_idINTEGER -- FK to orgs(org_id) > .. bunch of person stuff .. > ); > > CREATE TABLE relationships ( > rel_id INTEGER PRIMARY KEY, > from_id INTEGER, -- person_id or org_id > to_idINTEGER, -- person_id or org_id > entity_type INTEGER DEFAULT 0, -- 0 is "org", 1 is "person" > relationship TEXT > ); Why do you feel you need this last table at all? All the information is already available from persons.org_id. I realize that I didn't explain the problem well enough (thus is the peril of being too close to it for a long time). I can have an (org <--> org) relationship or an (org <--> person) relationship as well as a (person <--> person) relationship. To make matters worse, a person or an org can have multiple relationships based a role. Here is an example. John is the Director of Forestry Dept. Jane is the Head of the Economics Dept. Joe is a Scientist at the Forestry Dept. rel1: John (Director: Forestry) has a joint project with Jane (Head: Economics) rel2: John (Director: Forestry) is the boss of Joe (Scientist: Forestry) rel3: John (Person: null) is a friend of Joe (Person: null) rel4: John (Person: null) goes to the same church as Jane (Person: null) and so on. Well, I really have three "vectors" per "node" -- person, role, organization. In each node, either person or organization can be null (obviously, role will never be null because every node or entity has a purpose). Each of these vectors can be defined by a finite number of meaningful properties -- for example, a person has a name, nickname, address, contact information, and so on. Even the role can have a level of importance -- a "secretary" at the Forestry department is not as important as the "Director," so the role can have a weight, and so on. This is the problem that I have to model. I am trying to devise a storage structure for something like this in SQLite. -- Puneet Kishor http://punkish.eidesis.org/ Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/ Open Source Geospatial Foundation http://www.osgeo.org/education/ - collaborate, communicate, compete = - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Re: db design mixing different entities in the same table
P Kishor <[EMAIL PROTECTED]> wrote: I am modeling entities and their relationships to each other. A classic network digraph kind of stuff. The entities are organizations or persons. A person may be associated with none (org type "unassigned") or one organization, and, of course, an organization may have none or more persons in it. I started with CREATE TABLE orgs ( org_id INTEGER PRIMARY KEY, .. bunch of org stuff .. ); CREATE TABLE persons ( person_id INTEGER PRIMARY KEY, org_idINTEGER -- FK to orgs(org_id) .. bunch of person stuff .. ); CREATE TABLE relationships ( rel_id INTEGER PRIMARY KEY, from_id INTEGER, -- person_id or org_id to_idINTEGER, -- person_id or org_id entity_type INTEGER DEFAULT 0, -- 0 is "org", 1 is "person" relationship TEXT ); Why do you feel you need this last table at all? All the information is already available from persons.org_id. Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] -