Hi, Lee. Thanks for spelling this out -- it's especially interesting to get your thoughts as you break up the data into its FRBR-like parts.
I want to inject one bit of info, however. FRAD is not a complete picture of FRBR -- You need to use FRAD and FRBR together even though they seem to overlap. FRAD is only for authority data, not bibliographic data, and that is why it does not have title on the Work entity -- it assumes that the title is already there as part of the bibliographic description. The elements it defines are ones that are added to the Work elements to create headings. There is supposed to be IFLA committee work to reconcile all of the FRs at some point and to present a single picture of them, but right now it's a bit of a jumbled mess. (Note also that FRSAD (subject authority data) has gone beyond FRBR and now declares that ANYTHING can be a subject, not just the ones listed in FRBR.) This means that "standard FR" would have a Work title, no title on Expressions, and then the book title on the Manifestation. Yet I find your division equally logical, and basically proving the point that some of us have made: that while the data elements may remain the same, their organization into FRBR-like entities will vary. If we concentrate of defining the data elements well, then different communities can create their entities as they wish but we will still have compatibility at the element level. kc Quoting Lee Passey <[email protected]>: > On Thu, December 9, 2010 1:00 pm, Lee Passey wrote: > >> On Wed, December 8, 2010 12:08 pm, Karen Coyle wrote: >> >>> I'd love to see it. Everyone's FRBR implementation seems to be >>> different. Sometimes slightly different, sometimes wildly different. >>> Maybe you can put your schema someplace online that could be pointed to? >> >> I had to go and open my big mouth. >> >> After searching my hard drives I came up with >> "http://www.passkeysoft.com/~lee/OpenCat.sql". > > Okay, this is where we get down to brass tacks. > > First of all, though, let me be clear that my goal was never to create a > FRBR database; rather I was attempting to import OL data into an SQL, > FRBR-like database where it could more easily be validated and manipulated. > > At the time I was doing this the OL 'type/work' record had been defined, > but there were very few instances of them in the OL repository and it > appeared that all the data intended to be stored in the OL 'work' record > was also present in the 'edition' record. I decided to use the OL > 'edition' exclusively to obtain my data. > > As I described in my first post on this issue, when I unwrapped the > 'work' "onion", I discovered virtually nothing at it's core. > (Interestingly, if you compare the OL definition of a work > [http://openlibrary.org/type/work] with the FRBR definition of a work, > the only real intersection is in the "title", and "other distinguishing > characteristics" [assuming that everything other than the title in the > OL definition is an "other distinguishing characteristic"]. In FRAD, > even 'title' was removed from the list of attributes of a 'work'.) > > Thus, I ended up with a 'work' schema of: > > CREATE TABLE works ( > work_idn SERIAL UNIQUE PRIMARY KEY, > description text > ); > > The 'work_idn' column is the "handle" or identifier that will be used to > tie all the various pieces together, and the 'description' column is > intended to hold any unclassified description we might want to associate > with the work. > > Now we all know that it is useful to classify literary works according > to different categories, aka classifications, aka subjects. According to > FRBR, /any/ Group 1 or Group 2 entity can act as the subject of a work. > Additionally, FRBR defined a third group of entities to represent > subjects which are not part of Group 1 or Group 2. FRAD also extended > the attribute list of Group 2 entities to include a "subject" attribute, > which "Comprend des indices de classification." (Sorry, there is no > English version of FRAD that I could find online.) > > I think that categorization of categories (e.g. subject is a group 1 or > 2 FRBR entity, which we already have accounted for in the database, > subject is a group 3 entity, or subject is a Library of Congress > Classification Code) is worthwhile, but I didn't see the need to add a > table for each of these categories. Thus, my subject table is: > > CREATE TABLE subjects( > subject_idn SERIAL UNIQUE PRIMARY KEY, > subject text, > scheme text, > UNIQUE( subject, scheme ) > ); > > where the 'subject' column should hold the actual text of the subject > field, and 'scheme' is designed to hold the categorization of the > subject matter. The 'scheme' column should probably be attached to > another table to control the vocabulary; while it should not be > impossible, or even difficult, to add a new scheme, the list of schemes > should be controlled in some manner. (It could validly be argued that > some kind of vocabulary control should also be applied to the 'subject' > column.) > > Next we need something to tie these objects together. To tie entities to > the works they were responsible for, I defined > > CREATE TABLE creators ( > entity_idn integer REFERENCES entities( entity_idn ), > work_idn integer REFERENCES works( work_idn ), > UNIQUE( entity_idn, work_idn ) > ); > > and to tie subjects to works I defined > > CREATE TABLE work_subjects ( > work_idn integer REFERENCES works( work_idn ), > subject_idn integer REFERENCES subjects( subject_idn ), > UNIQUE( work_idn, subject_idn ) > ); > > To tie up the FRBR loose ends we would also need tables linking entities > /as subject matter/ to works, and works /as subject matter/ to other > works, but these tables I have not yet defined; doing so should be trivial. > > I now have a framework for a fairly complete FRBR-like schema. The last > task is to complete the remainder of the WEM chain. At this point, I > took the OL "type/edition" definition from > http://openlibrary.org/type/edition (which is now over a year old), and > went through the attribute list assigning whatever attribute definitions > I could to the tables I had already defined (entities and subjects). I > then wrote a program to evaluate a few million OL "type/edition" records > to identify those attributes manifested in "type/edition" records which > are not defined in the OL schema, either because they became obsolete > but were never purged, or which had been added but never documented. > Everything unaccounted for at this point needed to be allocated either > to an "expression" record or a "manifestation" record (or not, as we > shall see). > > Based on what is undoubtedly very subjective criteria, I ended up with > an 'expression' record as follows: > > CREATE TABLE expressions( > expression_idn SERIAL UNIQUE PRIMARY KEY, > work_idn integer REFERENCES works( work_idn ), > language varchar(3) REFERENCES languages( iso_639_2 ), > title text, > subtitle text, > first_sentence text, > notes text > ); > > Note that 'title' and 'subtitle' are single-value fields. My assumption > was that every time a work was assigned a new title it constituted a new > expression; thus, a single expression could not have two titles. > > There are, of course, entities who contributed to the creation of a > specific expression who are not themselves creators of the work (e.g. > translators, editors, illustrators, etc.). Despite the fact that I was > not sure (and still am not sure) if I could tease the identity of these > contributors and their roles out of the OL data sets, I nonetheless > defined a way to relate expressions to those contributors who could not > be considered as authors of the work: > > CREATE TABLE contributors ( > entity_idn integer REFERENCES entities( entity_idn ), > expression_idn integer REFERENCES expressions( expression_idn ), > role varchar(24), > UNIQUE( entity_idn, expression_idn, role ) > ); > > Similarly, there appear to be certain events associated with > expressions, such as copyrights and first publication, and it seemed > best to include those in the 'events' table described in part three, and > then simply tie each event to the expression: > > CREATE TABLE expression_events( > expression_idn integer REFERENCES expressions( expression_idn ), > event_idn integer REFERENCES events( event_idn ), > event_type varchar(32), > location text, > description text > ); > > (This definition is, of course, subject to my earlier comments about > restructuring the event table to move location and description to that > record.) > > One last set of records worth mentioning at this point is that relating > to URLs. It is very useful in this internet age to preserve URLs linking > one topic to another, and it is quite conceivable that a URL could > relate to an author, an entire work, a particular expression, or a > specific manifestation. Thus, I defined the following tables: > > CREATE TABLE links ( > scheme varchar(32), > url text > ); > > CREATE TABLE expression_links ( > expression_idn integer REFERENCES expressions( expression_idn ) > ) INHERITS (links); > > CREATE TABLE work_links ( > work_idn integer REFERENCES works( work_idn ) > ) INHERITS (links); > > DROP INDEX entity_links_entity_idn_idx; > CREATE TABLE entity_links( > entity_idn integer REFERENCES entities( entity_idn ), > ) INHERITS( links ); > > To bring this exposition to a rather abrupt end, it was my heretofore > unrealized intention at this point to take all the OL data not accounted > for in the foregoing tables, and to construct a 'manifestation' table to > hold the remaining data not accounted for by any previously defined > record, with a table tying entities to manifestations ('realisers'?) and > a table tying events to manifestations (e.g. publication). It should be > possible to tie entities and other data to manifestations through > events, but I haven't quite thought that through. (A 'publication' event > could have a date, a place, and links to entities as participants; the > event could then be linked to a manifestation, and by the transitive > property the event participants would be linked to the manifestation). > > I hope this series has been interesting, even if not enlightening. > > Cheers. > > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > To unsubscribe from this mailing list, send email to > [email protected] > -- Karen Coyle [email protected] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
