On Thu, December 9, 2010 1:00 pm, Lee Passey wrote: > On Wed, December 8, 2010 12:08 pm, Karen Coyle wrote: > >> I'd love to see it. Everyone's FRBR implementation seems to be >> different. Sometimes slightly different, sometimes wildly different. >> Maybe you can put your schema someplace online that could be pointed to? > > I had to go and open my big mouth. > > After searching my hard drives I came up with > "http://www.passkeysoft.com/~lee/OpenCat.sql".
Okay, this is where we get down to brass tacks. First of all, though, let me be clear that my goal was never to create a FRBR database; rather I was attempting to import OL data into an SQL, FRBR-like database where it could more easily be validated and manipulated. At the time I was doing this the OL 'type/work' record had been defined, but there were very few instances of them in the OL repository and it appeared that all the data intended to be stored in the OL 'work' record was also present in the 'edition' record. I decided to use the OL 'edition' exclusively to obtain my data. As I described in my first post on this issue, when I unwrapped the 'work' "onion", I discovered virtually nothing at it's core. (Interestingly, if you compare the OL definition of a work [http://openlibrary.org/type/work] with the FRBR definition of a work, the only real intersection is in the "title", and "other distinguishing characteristics" [assuming that everything other than the title in the OL definition is an "other distinguishing characteristic"]. In FRAD, even 'title' was removed from the list of attributes of a 'work'.) Thus, I ended up with a 'work' schema of: CREATE TABLE works ( work_idn SERIAL UNIQUE PRIMARY KEY, description text ); The 'work_idn' column is the "handle" or identifier that will be used to tie all the various pieces together, and the 'description' column is intended to hold any unclassified description we might want to associate with the work. Now we all know that it is useful to classify literary works according to different categories, aka classifications, aka subjects. According to FRBR, /any/ Group 1 or Group 2 entity can act as the subject of a work. Additionally, FRBR defined a third group of entities to represent subjects which are not part of Group 1 or Group 2. FRAD also extended the attribute list of Group 2 entities to include a "subject" attribute, which "Comprend des indices de classification." (Sorry, there is no English version of FRAD that I could find online.) I think that categorization of categories (e.g. subject is a group 1 or 2 FRBR entity, which we already have accounted for in the database, subject is a group 3 entity, or subject is a Library of Congress Classification Code) is worthwhile, but I didn't see the need to add a table for each of these categories. Thus, my subject table is: CREATE TABLE subjects( subject_idn SERIAL UNIQUE PRIMARY KEY, subject text, scheme text, UNIQUE( subject, scheme ) ); where the 'subject' column should hold the actual text of the subject field, and 'scheme' is designed to hold the categorization of the subject matter. The 'scheme' column should probably be attached to another table to control the vocabulary; while it should not be impossible, or even difficult, to add a new scheme, the list of schemes should be controlled in some manner. (It could validly be argued that some kind of vocabulary control should also be applied to the 'subject' column.) Next we need something to tie these objects together. To tie entities to the works they were responsible for, I defined CREATE TABLE creators ( entity_idn integer REFERENCES entities( entity_idn ), work_idn integer REFERENCES works( work_idn ), UNIQUE( entity_idn, work_idn ) ); and to tie subjects to works I defined CREATE TABLE work_subjects ( work_idn integer REFERENCES works( work_idn ), subject_idn integer REFERENCES subjects( subject_idn ), UNIQUE( work_idn, subject_idn ) ); To tie up the FRBR loose ends we would also need tables linking entities /as subject matter/ to works, and works /as subject matter/ to other works, but these tables I have not yet defined; doing so should be trivial. I now have a framework for a fairly complete FRBR-like schema. The last task is to complete the remainder of the WEM chain. At this point, I took the OL "type/edition" definition from http://openlibrary.org/type/edition (which is now over a year old), and went through the attribute list assigning whatever attribute definitions I could to the tables I had already defined (entities and subjects). I then wrote a program to evaluate a few million OL "type/edition" records to identify those attributes manifested in "type/edition" records which are not defined in the OL schema, either because they became obsolete but were never purged, or which had been added but never documented. Everything unaccounted for at this point needed to be allocated either to an "expression" record or a "manifestation" record (or not, as we shall see). Based on what is undoubtedly very subjective criteria, I ended up with an 'expression' record as follows: CREATE TABLE expressions( expression_idn SERIAL UNIQUE PRIMARY KEY, work_idn integer REFERENCES works( work_idn ), language varchar(3) REFERENCES languages( iso_639_2 ), title text, subtitle text, first_sentence text, notes text ); Note that 'title' and 'subtitle' are single-value fields. My assumption was that every time a work was assigned a new title it constituted a new expression; thus, a single expression could not have two titles. There are, of course, entities who contributed to the creation of a specific expression who are not themselves creators of the work (e.g. translators, editors, illustrators, etc.). Despite the fact that I was not sure (and still am not sure) if I could tease the identity of these contributors and their roles out of the OL data sets, I nonetheless defined a way to relate expressions to those contributors who could not be considered as authors of the work: CREATE TABLE contributors ( entity_idn integer REFERENCES entities( entity_idn ), expression_idn integer REFERENCES expressions( expression_idn ), role varchar(24), UNIQUE( entity_idn, expression_idn, role ) ); Similarly, there appear to be certain events associated with expressions, such as copyrights and first publication, and it seemed best to include those in the 'events' table described in part three, and then simply tie each event to the expression: CREATE TABLE expression_events( expression_idn integer REFERENCES expressions( expression_idn ), event_idn integer REFERENCES events( event_idn ), event_type varchar(32), location text, description text ); (This definition is, of course, subject to my earlier comments about restructuring the event table to move location and description to that record.) One last set of records worth mentioning at this point is that relating to URLs. It is very useful in this internet age to preserve URLs linking one topic to another, and it is quite conceivable that a URL could relate to an author, an entire work, a particular expression, or a specific manifestation. Thus, I defined the following tables: CREATE TABLE links ( scheme varchar(32), url text ); CREATE TABLE expression_links ( expression_idn integer REFERENCES expressions( expression_idn ) ) INHERITS (links); CREATE TABLE work_links ( work_idn integer REFERENCES works( work_idn ) ) INHERITS (links); DROP INDEX entity_links_entity_idn_idx; CREATE TABLE entity_links( entity_idn integer REFERENCES entities( entity_idn ), ) INHERITS( links ); To bring this exposition to a rather abrupt end, it was my heretofore unrealized intention at this point to take all the OL data not accounted for in the foregoing tables, and to construct a 'manifestation' table to hold the remaining data not accounted for by any previously defined record, with a table tying entities to manifestations ('realisers'?) and a table tying events to manifestations (e.g. publication). It should be possible to tie entities and other data to manifestations through events, but I haven't quite thought that through. (A 'publication' event could have a date, a place, and links to entities as participants; the event could then be linked to a manifestation, and by the transitive property the event participants would be linked to the manifestation). I hope this series has been interesting, even if not enlightening. Cheers. _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
