On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote:
On 5/31/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: > On 5/31/07, P Kishor <[EMAIL PROTECTED]> wrote: > > > > I realize that I didn't explain the problem well enough (thus is the > > peril of being too close to it for a long time). > > > > I can have an (org <--> org) relationship or an (org <--> person) > > relationship as well as a (person <--> person) relationship. To make > > matters worse, a person or an org can have multiple relationships > > based a role. Here is an example. > > > > John is the Director of Forestry Dept. > > Jane is the Head of the Economics Dept. > > Joe is a Scientist at the Forestry Dept. > > > > rel1: John (Director: Forestry) has a joint project with Jane (Head: > > Economics) > > rel2: John (Director: Forestry) is the boss of Joe (Scientist: Forestry) > > rel3: John (Person: null) is a friend of Joe (Person: null) > > rel4: John (Person: null) goes to the same church as Jane (Person: null) > > > > and so on. > > > > Well, I really have three "vectors" per "node" -- person, role, > > organization. In each node, either person or organization can be null > > (obviously, role will never be null because every node or entity has a > > purpose). Each of these vectors can be defined by a finite number of > > meaningful properties -- for example, a person has a name, nickname, > > address, contact information, and so on. Even the role can have a > > level of importance -- a "secretary" at the Forestry department is not > > as important as the "Director," so the role can have a weight, and so > > on. > > > > This is the problem that I have to model. I am trying to devise a > > storage structure for something like this in SQLite. > > > > > > > Been there, done that. About 15 years ago. oh, how I hate you ;-) > We called the model CAPPR: Company, Address, Person, Phone, and Roles. .. > > So bottom line, you're going to need your ENTITY tables, and your > RELATIONSHIP tables, and your ROLE tables. You may also need "lookup > tables" to help enforce data cleanliness by reducing spelling errors or > similar issues. .. So, you are telling me that you did it, but I really want to know "how" to do it. I am sure this is not a unique problem I am trying to solve. I just am getting all knotted up trying to figure out a sane db structure to model this within so I can put data in and take data out without major contortions and yet have it be flexible and scalable for different kinds of relationships. So, the real CAPPR will be if you give me a few db design guidelines here. ;-) -- Puneet Kishor http://punkish.eidesis.org/ Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/ Open Source Geospatial Foundation http://www.osgeo.org/education/
The database was rather simple in pictures, but complex by the time you added everything in to help the database help your program stay out of trouble. We chose to identify a person by various sub-fields, and since we were not internationalization-aware, we didn't have to have our innocence sullied by the different positions of family names, given names, surnames, and alternate names based on who you were in relationship to. (Japanese and asian language names, germanic, russian, spanish, etc.) So our sub-fields were honorific (Sir, Don, The Honorable), title (Mr. Mrs. Dr. Reverend, etc), first name, middle name, last name, and something we could never decide on a name for: Post-name (the III, Junior, etc), and the degreed position (Ph.D., M.S.C.A, etc). Data entry of this name could be by-field (preferred) or free text, which the software then parsed into the separate fields for correction, and easy searching later. A little bit of human effort here helped increase the utility of their contact management activities later. We identified address the same way the post office does: street number, pre-direction, street name, street type, post-direction, suite/apartment qualififier, suite/apartment number, city, state, zip, zip+4. The software used address correction software certified for bulk mailing purposes with the USPS. Company name was almost the simplest. There was a freetext data entry field, and then the actual database field. Abbreviations in the freetext were expanded to the full name in the database field. Phone numbers had country codes, area codes, telephone numbers, extensions, passcodes, and special dialing instructions. Roles were multi-level. Roles were identified by a unique ID number, and then split into appropriateness (which relationship it could describe [person-company, company-address, person-address, person-phone, company-phone, address-phone), the directionality description (left to right, right to left), and then into categories. Next, since there were 4 entity tables, there were 6 relationship tables whose foreign keys were the corresponding entries in the entity tables & the role table, plus a free-text field for notes or comments. Each column whose content was restricted to a list of standardized values (the 50 states + canadian provinces, armed force theatres, territories, etc; the list of titles or post-fix names; the directionality of streets, etc) had a lookup table against which the column was validated. We found that these lookup tables would feed directly into the user interface as a pop-up list. Each lookup table had both a key entry, and a display value, and an on-screen order so that more frequently used options were near the top of the list. Each and every table contained a set of columns describing who entered the data, when it was entered, and when it was last modified (and by whom). Each and every table had a journal table recording changes to the table by recording previous-value tuples. And last, but extremely important for debugging and auditability, there was a transaction table which recorded the LOGICAL operations performed against the databases. You could, in theory, replay the transaction table against an empty database and your data would be re-created correctly, step by step. We did, in practice, find bugs by replaying reported problems and going "ah ha! They really did want to enter in a 25-letter last name." So, while I haven't actually included SQL statements that build the database, the description above should be sufficient to do so. Hope that helps.