On Thu, December 9, 2010 1:00 pm, Lee Passey wrote:

> On Wed, December 8, 2010 12:08 pm, Karen Coyle wrote:
>
>> I'd love to see it. Everyone's FRBR implementation seems to be
>> different. Sometimes slightly different, sometimes wildly different.
>> Maybe you can put your schema someplace online that could be pointed to?
>
> I had to go and open my big mouth.
>
> After searching my hard drives I came up with
> "http://www.passkeysoft.com/~lee/OpenCat.sql";.

Okay, this is where we get down to brass tacks.

First of all, though, let me be clear that my goal was never to create a 
FRBR database; rather I was attempting to import OL data into an SQL, 
FRBR-like database where it could more easily be validated and manipulated.

At the time I was doing this the OL 'type/work' record had been defined, 
but there were very few instances of them in the OL repository and it 
appeared that all the data intended to be stored in the OL 'work' record 
was also present in the 'edition' record. I decided to use the OL 
'edition' exclusively to obtain my data.

As I described in my first post on this issue, when I unwrapped the 
'work' "onion", I discovered virtually nothing at it's core. 
(Interestingly, if you compare the OL definition of a work 
[http://openlibrary.org/type/work] with the FRBR definition of a work, 
the only real intersection is in the "title", and "other distinguishing 
characteristics" [assuming that everything other than the title in the 
OL definition is an "other distinguishing characteristic"]. In FRAD, 
even 'title' was removed from the list of attributes of a 'work'.)

Thus, I ended up with a 'work' schema of:

CREATE TABLE works (
      work_idn SERIAL UNIQUE PRIMARY KEY,
      description text
);

The 'work_idn' column is the "handle" or identifier that will be used to 
tie all the various pieces together, and the 'description' column is 
intended to hold any unclassified description we might want to associate 
with the work.

Now we all know that it is useful to classify literary works according 
to different categories, aka classifications, aka subjects. According to 
FRBR, /any/ Group 1 or Group 2 entity can act as the subject of a work. 
Additionally, FRBR defined a third group of entities to represent 
subjects which are not part of Group 1 or Group 2. FRAD also extended 
the attribute list of Group 2 entities to include a "subject" attribute, 
which "Comprend des indices de classification." (Sorry, there is no 
English version of FRAD that I could find online.)

I think that categorization of categories (e.g. subject is a group 1 or 
2 FRBR entity, which we already have accounted for in the database, 
subject is a group 3 entity, or subject is a Library of Congress 
Classification Code) is worthwhile, but I didn't see the need to add a 
table for each of these categories. Thus, my subject table is:

CREATE TABLE subjects(
     subject_idn SERIAL UNIQUE PRIMARY KEY,
     subject text,
     scheme text,
     UNIQUE( subject, scheme )
);

where the 'subject' column should hold the actual text of the subject 
field, and 'scheme' is designed to hold the categorization of the 
subject matter. The 'scheme' column should probably be attached to 
another table to control the vocabulary; while it should not be 
impossible, or even difficult, to add a new scheme, the list of schemes 
should be controlled in some manner. (It could validly be argued that 
some kind of vocabulary control should also be applied to the 'subject' 
column.)

Next we need something to tie these objects together. To tie entities to 
the works they were responsible for, I defined

CREATE TABLE creators (
     entity_idn integer REFERENCES entities( entity_idn ),
     work_idn integer REFERENCES works( work_idn ),
     UNIQUE( entity_idn, work_idn )
);

and to tie subjects to works I defined

CREATE TABLE work_subjects (
     work_idn integer REFERENCES works( work_idn ),
     subject_idn integer REFERENCES subjects( subject_idn ),
     UNIQUE( work_idn, subject_idn )
);

To tie up the FRBR loose ends we would also need tables linking entities 
/as subject matter/ to works, and works /as subject matter/ to other 
works, but these tables I have not yet defined; doing so should be trivial.

I now have a framework for a fairly complete FRBR-like schema. The last 
task is to complete the remainder of the WEM chain. At this point, I 
took the OL "type/edition" definition from 
http://openlibrary.org/type/edition (which is now over a year old), and 
went through the attribute list assigning whatever attribute definitions 
I could to the tables I had already defined (entities and subjects). I 
then wrote a program to evaluate a few million OL "type/edition" records 
to identify those attributes manifested in "type/edition" records which 
are not defined in the OL schema, either because they became obsolete 
but were never purged, or which had been added but never documented. 
Everything unaccounted for at this point needed to be allocated either 
to an "expression" record or a "manifestation" record (or not, as we 
shall see).

Based on what is undoubtedly very subjective criteria, I ended up with 
an 'expression' record as follows:

CREATE TABLE expressions(
     expression_idn SERIAL UNIQUE PRIMARY KEY,
     work_idn integer REFERENCES works( work_idn ),
     language varchar(3) REFERENCES languages( iso_639_2 ),
     title text,
     subtitle text,
     first_sentence text,
     notes text
);

Note that 'title' and 'subtitle' are single-value fields. My assumption 
was that every time a work was assigned a new title it constituted a new 
expression; thus, a single expression could not have two titles.

There are, of course, entities who contributed to the creation of a 
specific expression who are not themselves creators of the work (e.g. 
translators, editors, illustrators, etc.). Despite the fact that I was 
not sure (and still am not sure) if I could tease the identity of these 
contributors and their roles out of the OL data sets, I nonetheless 
defined a way to relate expressions to those contributors who could not 
be considered as authors of the work:

CREATE TABLE contributors (
     entity_idn integer REFERENCES entities( entity_idn ),
     expression_idn integer REFERENCES expressions( expression_idn ),
     role varchar(24),
     UNIQUE( entity_idn, expression_idn, role )
);

Similarly, there appear to be certain events associated with 
expressions, such as copyrights and first publication, and it seemed 
best to include those in the 'events' table described in part three, and 
then simply tie each event to the expression:

CREATE TABLE expression_events(
     expression_idn integer REFERENCES expressions( expression_idn ),
     event_idn integer REFERENCES events( event_idn ),
     event_type varchar(32),
     location text,
     description text
);

(This definition is, of course, subject to my earlier comments about 
restructuring the event table to move location and description to that 
record.)

One last set of records worth mentioning at this point is that relating 
to URLs. It is very useful in this internet age to preserve URLs linking 
one topic to another, and it is quite conceivable that a URL could 
relate to an author, an entire work, a particular expression, or a 
specific manifestation. Thus, I defined the following tables:

CREATE TABLE links (
     scheme varchar(32),
     url text
);

CREATE TABLE expression_links (
     expression_idn integer REFERENCES expressions( expression_idn )
) INHERITS (links);

CREATE TABLE work_links (
     work_idn integer REFERENCES works( work_idn )
) INHERITS (links);

DROP INDEX entity_links_entity_idn_idx;
CREATE TABLE entity_links(
     entity_idn integer REFERENCES entities( entity_idn ),
) INHERITS( links );

To bring this exposition to a rather abrupt end, it was my heretofore 
unrealized intention at this point to take all the OL data not accounted 
for in the foregoing tables, and to construct a 'manifestation' table to 
hold the remaining data not accounted for by any previously defined 
record, with a table tying entities to manifestations ('realisers'?) and 
a table tying events to manifestations (e.g. publication). It should be 
possible to tie entities and other data to manifestations through 
events, but I haven't quite thought that through. (A 'publication' event 
could have a date, a place, and links to entities as participants; the 
event could then be linked to a manifestation, and by the transitive 
property the event participants would be linked to the manifestation).

I hope this series has been interesting, even if not enlightening.

Cheers.

_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to