Re: [ol-tech] Improving Open Library RDF output - correction / Types & Schema

Ross Singer Sun, 05 Feb 2012 19:24:58 -0800

On Sun, Feb 5, 2012 at 8:02 PM, Ben Companjen <[email protected]> wrote:
> On 5 February 2012 17:06, Karen Coyle <[email protected]> wrote:
>>
>> On 2/2/12 4:24 AM, Ben Companjen wrote:
>>
>>>
>>> I looked at the underlying type schema at http://openlibrary.org/type
>>> and it seems that there is no "contributor" type. There is
>>> /type/edition/contributions (an array of strings), which may be where
>>> contributors are stored when I manually enter translators and
>>> designers, though how exactly the role and contributor name would be
>>> stored in this string is unclear to me.
>>
>>
>> OL doesn't currently store roles, AFAIK. The data element for roles for
>> authors was added later, but most of the incoming data doesn't include
>> roles, and none of it includes roles for authors (only contributors).
>
> I was talking about roles that one can add to contributors of an
> edition. They must be stored somewhere, because OL remembers what I
> enter there. 
> http://openlibrary.org/works/OL16419933W/Een_kleurige_wiskundige_wereld
> not only shows names, but also roles of contributors. The RDF
> currently only shows the names.
> Just to be precise: 'data element for roles for authors' =
> /type/author_role ? It's too bad I cannot easily view the rationale
> for each element, because I wondered what this element/type was
> supposed to do and how this element/type is used. Or can I see it
> somewhere?
>
>>
>>> When looking at some MARC records for
>>> http://openlibrary.org/books/OL21322149M/Mies_van_der_Rohe (author
>>> "The Museum of Modern Art"), indeed MoMA is in the 110 field (I
>>> learned yesterday that field is for corporate entities) and the author
>>> is in the "by statement", shown by OL too, as "by Philip C. Johnson."
>>> I don't want to accuse anybody, but this leads me to think perhaps
>>> ImportBot doesn't know how to import this. Or maybe this record was
>>> imported (November 1, 2008) before the decision was made?
>>
>>
>> It looks to me like this record was imported correctly based on the
>> algorithm. The "by" statement unfortunately is just free text so no data
>> elements are taken from it. Johnson is included as an added author (a 700 in
>> the MARC record). A big problem with the library data format is that you
>> don't know the relationship of the person listed in the 700 to the item
>> being cataloged: it could be the author of a part (like a chapter or intro),
>> it could be a co-author, it could be a conductor of a piece of music, etc.
>> There are some serious issues with the library data as it exists today, and
>> these limit what you can know from the metadata you receive.
>>
> Oh, I missed the 700 field. But even if the record was imported
> correctly, and it looks to me that all the other editions were created
> from similar records, how does MoMA end up as the author? Was it
> perhaps WorkBot then?
>
> I quite liked this post about free text in MARC fields - and then
> noticed you had commented:
> http://robotlibrarian.billdueber.com/isbn-parenthetical-notes-bad-marc-data-1/
> It mentions the many ways books are described as hardcover or
> paperback and that made me wonder why OL wants that description as
> free text. The options could be limited, I think. Or does OL
> automatically normalize the input to "hardcover", "paperback", "...",
> like the example descriptions?
>
> On a side note: I just had a wild idea: don't show the "by statement"
> field on the edit form, not even in the librarian mode, if it is
> empty, so that no one is tempted to put anything in it.
>
>>>>
>>> Searching for authors with "museum" in the name yields 5608 results,
>>> many having over 100 works attached :)
>>
>>
>> Yes, these are from an input source that we now regret having imported. The
>> input source used the MARC record format, but used it incorrectly. When it
>> went through the normal processing, those errors followed through.
>>
>>
>>> The work type doesn't have a contributors/collaborators field at all.
>>
>>
>> No, it isn't supposed to. A work has creators. Contributors are associated
>> with expressions and manifestations. This all comes from something called
>> "FRBR"
>>
> Thanks for the pointers. I had already read a few things on FRBR, and
> I'll take it from you that Works only have authors/creators. I guess
> it makes sense that they do.
>
> It's just that I wasn't sure where the corporate identities should go
> in the Open Library if I were to add for example a publication
> "authored" by a government (no references to a human author) manually.
> The same question arises when I were to edit the example of the MoMA
> book about Mies van der Rohe (maybe not the best example, as Philip
> Johnson becomes author and MoMA doesn't need to be a contributor).
> Can I have an OL Work without author, and put the goverment or MoMA in
> a contributor field in the OL Edition(s)? OL Edition contributors are
> just saved and treated as strings (which I find a little
> dissapointing), so entering a corporate identity there isn't a
> problem. The RDF template outputs these contributors as foaf:Persons,
> though.
>
> Your proposed field for "responsible organization" could help here.
> But should it be part of OL's Work or Edition type?
>
>>>  From a linked data perspective it would be nice if corporate entities
>>> (including publishers, although not all publishers are corporate
>>> entities) are not just strings, but real entities.
>>
>>
>> Yes, it would be nice, but the data unfortunately doesn't always support it.
>> The corporate entities that come in on 110/710 fields are entities in the
>> library world and can be found in VIAF with identifiers. The publishers are
>> NOT entities, but are a transcription of how the publisher or imprint name
>> was presented on the title page of the book. The imprint name \= publisher
>> name, so connecting these is difficult. Edward Betts did some
>> experimentation around this at one point, but the results were very fuzzy.
>
> If we (users) want it, and if the data model supports it, I think we
> can make the data support it, programmatically or manually. I'm from
> the Discogs world, in which everything is done manually by people
> mostly smart enough to match imprints (a.k.a. labels) to
> entities/publishers behind those labels. Even recording, mixing and
> mastering studios and the companies behind the labels (mentioned as
> copyright holders) are matched. Guidelines and some rules are needed
> there, but they seem to work.
> Sure, Open Library is not Discogs and the world of books is different
> from the world of recorded music (one important reason, I guess, is
> that the former is much older), but goals of OL and Discogs are
> similar (one page for every book/music record) and the means
> (collaborative editing) are too. But I'm not here to just promote
> Discogs - I like MusicBrainz very much too ;-)
>
>>>>
>>> I would only consider putting a corporate body in the author field if
>>> a human author is not mentioned in the book at all, which is very
>>> rarely the case.
>>
>>
>> Not so rare, actually. Most documents out of corporations and government
>> bodies don't attribute the document to a human. But aside from that, much of
>> the data in OL is based on rules used by Anglo-American libraries to make
>> these decisions. Many of us could see logic in other ways of doing things.
>>
>> The difficulty is getting enough consistency to do the merging between
>> editions and works. That's one of the reasons why we put corporate bodies in
>> collaborator -- how could you possibly explain when a corporate body could
>> be an author in a way that folks could easily understand? The rules are over
>> a thousand pages long, and if you want to delve into that, here's a zip file
>> with the final draft:
>>
>> http://www.archive.org/details/ResourceDescriptionAccessrdaDraftNov.2008
>
> Just to make sure I understand you correctly: by "collaborator" you
> mean "contributor" in the OL Edition? Or a role at the "creator"
> level?
> I may have a look at the RDA rules (I had signed up for access to the
> final rules in the trial period, but was quickly scared away by the
> extent of the documents). About authorship: I believe in The
> Netherlands by default you don't own the author's rights (~copyright)
> for a publication if you wrote it as part of your job - the
> organization you work for owns the copyright in that case. So I don't
> find it hard to understand the human author of a publication is
> hidden. But that may be just me.
>
>>>
>>> VIAF could be useful for disambiguation, but there is no obvious way
>>> to enter a VIAF ID (or any other URI for a person) in an OL record.
>>
>>
>> This seems to me like a good feature request. Note that VIAF and Wikipedia
>> have some mutual linking.
>>
> The LOD cloud diagram indeed shows an arrow from VIAF to DBPedia. And
> the Wikipedia article about J.K. Rowling not only has a link to VIAF,
> there is one to OL as well :) This should only make things easier to
> link.
>
> I'll open an issue to request allowing for entering VIAF IDs soon.
>
>>> Is
>>> this what http://openlibrary.org/type/author/uris is for? That could
>>> make Open Library less isolated in the LOD cloud [2]. I have added
>>> links to VIAF pages to a few authors, but they are probably in
>>> /type/author/links.
>>> It seems VIAF has at least 4 entries for the MoMA (New York), by the way.
>>
>>
>> VIAF takes data from about 20 different national library systems and
>> clusters the headings for the same entity, where it can. A match on any
>> entity in a cluster should be linked to the cluster ID. The MOMA example
>> shows the difficulty of creating the clusters algorithmically. I am hoping
>> that VIAF will eventually allow human merging of entries.
>>
>>
>>> A (software) agent consuming RDF should not have a hard time figuring
>>> out that a foaf:Person in its knowledge base is the same as a
>>> foaf:Agent in Open Library, so I don't think this is the best reason
>>> to not change to foaf:Agent - not losing specificity is a better
>>> reason :)
>>
>>
>> That's if you think that reasoning will be a common feature of RDF software.
>> Some folks have doubts.
>>
> Sindice.com does some simple inferencing: it can add the superclasses
> of resources in a search result. Not all RDF software will do
> reasoning of course, but I believe in the world of catalogs there will
> be useful software that does do it.
>
I agree with both of you here (some things will do it, however they
will be at an increasingly smaller percentage than those that don't)
-- that said, I think that most agents will know to look for
foaf:Person, Organization AND Agent.


-Ross.
> Ben
>
>> kc
>>
>>
>>
>> --
>> Karen Coyle
>> [email protected] http://kcoyle.net
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet
> _______________________________________________
> Ol-tech mailing list
> [email protected]
> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
> To unsubscribe from this mailing list, send email to 
> [email protected]
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Re: [ol-tech] Improving Open Library RDF output - correction / Types & Schema

Reply via email to