On Thu, December 9, 2010 1:00 pm, Lee Passey wrote:

> On Wed, December 8, 2010 12:08 pm, Karen Coyle wrote:
>
>> I'd love to see it. Everyone's FRBR implementation seems to be
>> different. Sometimes slightly different, sometimes wildly different.
>> Maybe you can put your schema someplace online that could be pointed to?
>
> I had to go and open my big mouth.
>
> After searching my hard drives I came up with
> "http://www.passkeysoft.com/~lee/OpenCat.sql";. Looking at it, I can see that
> it is obviously incomplete; I suspect that much of the schema I wrote later on
> an ad-hoc basis. It will take some looking at the Java code and the database
> itself to flush it out (as in dogs flushing out the quail, not "flesh it out"
> as in giving it form and substance -- although there might be some of that
> required as well), but this first early draft might give a hint as to what I
> was trying to do.

Okay, I've looked at the code and the old database. It appears that what I've
posted really is as far as I got. With that in mind, here's how I've reduced
my theories to practice.

Let's start with FRBR group 1 -- "responsibleEntities", which I have simply
called "Entities" (I'm not very happy with this name--I've seen it variously
called "creators", "actors", "authors", etc.--but haven't yet come up with
anything more intuitive). While it is true that there is not anything that
comes anywhere close to being a unique, canonical "name" for an entity, from a
practical standpoint I needed /something/ to uses as a handle; something I so
I know when I'm talking about /this/ Entity and not /that/ Entity. I think
that this is the problem the IFLA was trying to solve when they specified that
a "Name" is a required attribute of a Group 1 entity. Well, if an assumed name
is as valid as a given name, I guess I could give an Entity a name as well,
valid inside my namespace. Thus, I could ensure both uniqueness and
immutability (something not supported by OL ids). Additionally, I felt it
would be useful to have some sort of enumeration of just what type of thing
the entity /was/ (obviously highly subjective) such as "person", "corp",
"gov", "ngo", etc.

So, the basic definition of the Entity table was:

CREATE TABLE entities (
    entity_idn SERIAL UNIQUE PRIMARY KEY,
    entity_type varchar(32)
);

The "entity_type" is defined as a string, but the intent is to limit the valid
values to a controlled set, something that cannot be done in the database
definition itself. The exact permissible values are even now only partially
defined.

An entity may be labeled by any number of names, so next we need an open-ended
list of names; each entry should be unique. As a name may be used by any
number of entities, the list must be non-volatile; a unique name may be added,
but no name may be removed.

CREATE TABLE names (
    name_idn SERIAL UNIQUE PRIMARY KEY,
    name varchar(512) UNIQUE,
    scheme text
);

The "scheme" column was included for an eye to the future, with the notion
that it may become necessary to record a namespace or categorization for the
name itself (e.g. "given name", "pen name", "translated name", "corporate
name"). On sober reflection, I think that property column should be moved to
the "entity_names" table. Which leads us to the "entity_names" table.

"Entities" to "Names" is a many-to-many relationship, which is represented in
a relational database as a table of intersection records. Each intersection
record ties two (or more) related records together into a one-to-one
relationship. Thus:

CREATE TABLE entity_names(
    entity_idn integer REFERENCES entities( entity_idn ),
    name_idn integer REFERENCES names( name_idn )
);

This is where I would move the "scheme" column, such that it can be said, "in
this instance this Entity has been assigned this Name which is being used
according to this Scheme (maybe 'usage' would be a more intuitive column
name).

In order to well and truly identify our archetypal Entity, we next need to
associate Events (and relevant dates) with it. As I see it, every Event has
two date/time values associated with it: the beginning, and the end. An event
may be virtually instantaneous (receipt of filing of articles of
incorporation) or it may go on for some time (the Thirty Years' War). I
originally designed a small table to record simple time spans for events, and
a more complex intersection table to relate a specific Entity with a specific
Event, to wit:

CREATE TABLE events (
    event_idn SERIAL UNIQUE PRIMARY KEY,
    begin_date date,
    end_date date,
    UNIQUE( begin_date, end_date )
);

CREATE TABLE entity_events(
    entity_idn integer REFERENCES entities( entity_idn ),
    event_idn integer REFERENCES events( event_idn ),
    event_type varchar(32),
    location text,
    description text
);

Looking back, I think I would move the "event_type", "location" and
"description" columns to the "events" table definition, and add a "role"
column to the "entity_events" table to record the role that the Entity played
in the Event. This structure is probably overkill for recording and
differentiating authors (I currently do not use the "location" or
"description" columns) but may be useful in the future to relate author data
to a genealogical database.

Using the foregoing database schema I believe I have been able to capture all
the data about Entities (except external links and biographical data) stored
in OpenLibrary cataloging data sets, and it seems to be a complete
implementation of the requirements of FRBR Group 1.

Next: The Exciting Concluding Chapter: Furrbuffizing OL Book Data!

_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to