I agree entirely with your analysis, Lee, but am still confused about your objection to assigning a Work a title!
Doesn't one, in some circumstances and applications, sometimes need to present a Work to users as a result on a screen? How is it to be labelled? Why is it inappropriate for the Work entity to contain a title (or really, you are quite right, no reason to only insist on ONE title, perhaps multiple), that can be used by the application, when convenient/useful to do so, to label the Work group? Seems useful to me, and even if not useful, certainly not any kind of intrinsic violation of the conceptual model. If it makes sense for your application, there is no reason not to do it. (In legacy library data, the 'uniform title', or some portion of it, sometimes serves as a "work title", although other times is used for other things, making it very confusing to figure out how to use the data. At least if you use something in a semantically consistent way as a work title, you're still within the bounds of the FRBR model). I agree entirely, which was really my point that started this thread, that any assertion that is true of every (not just existing but every _possible_) Expression in a Work (and, to say the same thing another way: additionally of every Manifestation in any Expression of that work) is best modelled as a property of the Work. Some systems (like our legacy MARC-based systems) may instead model it as a duplicate identical assertion made on every single Manifestation. This is semantically the same thing, it kind of sort of works, but is not very tidy/maintainable and it makes your data much harder to use for many use cases (of course, since it's semantically the same thing, it could always be transformed to the former). But really, my suggestion that began this part of the thread, that means there is absolutely no need for some entity representing "the whole thing", the Work entity already does, as a neccessary consequence of the FRBR model -- and using the Work entity for this is a perfectly consistent, maintainable, clear, and re-useable way to model such assertions, with no downsides I can think of. I also agree that something like an author is BEST modelled as a relationship to an Author entity, for the reasons you say. Although some simple or constrained systems may not be able to do that, and may instead just slap an author name string as a property of a Work object, instead of, as we say in the library world, 'controlling' the author. Even if you're doing that out of necessity or desperation, if you generally try to otherwise align your modelling with FRBR, you're still going to have benefits in inter-operability with other people with FRBR-ish data, among other benefits. The FRBR model isn't a straightjacket or an all or nothing thing, it's a conceptual framework. This probably has nothing to do with OL anymore, but it's an interesting conversation. ________________________________________ From: [email protected] [[email protected]] On Behalf Of Lee Passey [[email protected]] Sent: Wednesday, November 17, 2010 7:04 PM To: Open Library -- technical discussion Subject: Re: [ol-tech] New treatment of frbr:manifestation in work RDF On Mon, November 15, 2010 5:34 pm, Jonathan Rochkind wrote: > I mostly agree with Lee's general analysis. Except I'd note: Just > because the FRBR document doesn't give a Work an author or title, > doesn't mean we can't or shouldn't. Let me try to distill much of this reply into two assertions that I think we are in complete agreement on: 1. Every work has at least one "creator" and possibly multiple "contributors." (Even if we don't know the identity of the creator, we still know that s/he must have existed.) 2. Every assertion that can be said to be true for every expression of a work is, and should be, a property of the Work and not one or more of it's expressions. Now every bibliographic system worth its salt will capture and preserve the foregoing information; the only question is /how/ it is captured and preserved. The FRBR specification defines 10 entities grouped into three categories. These three categories can be generally be considered as those entities dealing with creative works (group 1), those entities dealing with authors and creators (group 2) and those entities dealing with subject matters (group 3). All entities have attributes (properties) generally manifested as name/value pairs, where the "name" is the name of the attribute in the entity definition (e.g. 'Title') and the "value" is the data associated with the attribute for a given instance of the entity (e.g. 'The Adventures of Tom Sawyer'). Entities may not be the attribute (property) values of other entities, but an entity may have an attribute (property) which is a relationship to another entity. This distinction between attribute values and entity relationships is somewhat artificial, but I believe it is based on the valid notion that an entity is a complex object which itself possesses a collection of attributes. It is at least inefficient to expect a "Work" entity to maintain all the properties of every "Person" entitiy that contributed to its creation, and it is highly error-prone for multiple "Work" entities to /all/ attempt to maintain the same data repetitively. I don't think it is inappropriate for a "Work" entity to maintain the identities of those entities involved in its creation, but attempting to do so by recording an author's name is an inadequate way to do so. A much better way is to assign a Universally Unique IDentifier to each author, and store that as a "CreatedBy" attribute on the work. Some may consider it to be a semantically trivial distinction to say that "a work contains an author" as opposed to saying "a work contains an author identification," but I consider it a highly important distinction if you wish to maintain the proper perspective between entities. > I think the FRBR document probably _should_ have allowed such attributes. > It won't be a _transcribed_ title or author, because a Work is an abstract > thing, there is nothing to transcribe. It might not fit into _library_ > workflow to assign a title or author to a Work. > > But a Work still has a creator, and still can have an assigned (not > transcribed) title labelling what the work is. If it's not in the > official documented FRBR list of attributes, oh well, we can add it > ourselves anyway if we need it -- to me it seems adding extra attributes > to entities still used largely as FRBR intended is fine, it won't make > your data incompatible with anyone elses FRBR data. Actually, the FRBR specification provides that a "Work" entity /may have/ a "Title" property, much to my dismay. I disagree with this schema for a number of reasons. In the first instance, a work need not have a title of any kind; there are many, many untitled works in existence. In the second instance, a work may, over the course of its virtually unlimited life-span, have many different titles, no one of which can be considered authoritative; I am occasionally struck when watching "The Daily Show with Jon Stewart" to hear an author state, "that's not the title I gave it, the publisher insisted on that one." In the third instance, a title cannot hope to provide a unique and unambiguous reference to a specific work. I suspect that it is this last objection that you are referring to when you make the distinction between an "assigned" title and a "transcribed" title. Clearly, a Work needs a unique identifier if for no other reason than to facilitate the creation of relationship attributes. The identifier may be Universally Unique (e.g. ISTC:A02200900000A88F or OL:OL53919W) or Locally Unique (e.g. record number 24419 in my database), but it must have some assurance of uniqueness to be useful ("Tom Sawyer" just won't cut it). I believe that what you call an "assigned title" is what I call a "unique identifier." I try to avoid the word "Title" because of its semantic baggage. I'd bet that when you use the word "Title," everyone reading hears "transcribed title." And if I've misread you, I'm sure you'll let me know ;-). So, the way I've modeled my own database is something like this: I have tables for Actors, Events, Works, Expressions, and Manifestations, each of which has a unique ID property. The Expressions table has a Foreign Key constraint on the Works table, so an Expression record cannot exist without referencing a specific Work record. Their is no reciprocal column in the Work table, thus limiting me to a one-to-many relationship between Works and Expressions. My Work table has but two columns: an auto-generated Identity column, which is guaranteed to be locally unique, and a "notes" field designed to hold unqualified free-form text relating to a work as a whole in all its iterations. The Event table captures date/time and location information, but also has a subject-verb-object type of function. In the context of the current discussion, I use it to relate a work to an author via a "creation" event. In the case of Mark Twain's Tom Sawyer, a record may indicate "during [some period of time] at [some location] the subject [LUID of Samuel Clemens] did [CREATE verb] the [LUID of Tom Sawyer]. (As pointed out by the FRBR spec, relations among entities tend not to be limited to certain other entities. An Actor (in my parlance, FRBR Group 2 entities) can be associated with Works, Expressions, Manifestations and even Items in different roles, and their numbers may increase at each level.) > I would also add, Lee, I think it's totally consistent with your and my > analysis to in fact give attributes to Works. Consider, as you say: > > * "Every assertion|attribute value|property value of or about a work is > also a valid assertion|attribute value|property value of or about the > Expression that expresses the work. " > > Indeed. So if there's something you want to say which is _inherently_ > true about all Expressions (and their Manifestations and THEIR Items) of > a Work, the proper place to say it is about the Work. Then it is true > of all past and future EMI of the Work, just as you intended. Absolutely. One of the best examples of this, I think, is the kind of free-form text recommendation you regularly see on library-oriented social networking sites: "I liked this book, so you will too." Rarely is this kind of comment directed at a particular edition or printing; it's intended to be a comment on the work as a whole, and every derivation thereof, and should therefore be stored and transmitted as part of that entity. Most discussion (dissention?) regarding the proper assignment of attributes arises when trying to answer the question "does the property I have identified as being possessed by the instance of a work I now hold in my hand belong conceptually at the Item level, the Manifestation level, the Expression level, or the Work level." When the answer to that question ends up being "wrong" (defined as "not the way I would have done it") I think it is usually as a result of not being able to see the trees for the forest. My own heuristic is to ask, "Is this assertion true (every attribute|property has the same value) of every associated (abstract) instance of this Entity?" If so, the property probably belongs at the next higher level; move it and ask again at that level. It is by asking this question (and by removing relationships to other entities that are modeled outside of the entity itself) that I have arrived at the conclusion that a unique ID (assigned title?) is pretty much all you need in a work record. _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected] _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
