Re: [ol-tech] Improving Open Library RDF output - correction / Types & Schema

Ben Companjen Sun, 05 Feb 2012 17:02:27 -0800

On 5 February 2012 17:06, Karen Coyle <[email protected]> wrote:
>
> On 2/2/12 4:24 AM, Ben Companjen wrote:
>
>>
>> I looked at the underlying type schema at http://openlibrary.org/type
>> and it seems that there is no "contributor" type. There is
>> /type/edition/contributions (an array of strings), which may be where
>> contributors are stored when I manually enter translators and
>> designers, though how exactly the role and contributor name would be
>> stored in this string is unclear to me.
>
>
> OL doesn't currently store roles, AFAIK. The data element for roles for
> authors was added later, but most of the incoming data doesn't include
> roles, and none of it includes roles for authors (only contributors).


I was talking about roles that one can add to contributors of an
edition. They must be stored somewhere, because OL remembers what I
enter there. 
http://openlibrary.org/works/OL16419933W/Een_kleurige_wiskundige_wereld
not only shows names, but also roles of contributors. The RDF
currently only shows the names.
Just to be precise: 'data element for roles for authors' =
/type/author_role ? It's too bad I cannot easily view the rationale
for each element, because I wondered what this element/type was
supposed to do and how this element/type is used. Or can I see it
somewhere?

>
>> When looking at some MARC records for
>> http://openlibrary.org/books/OL21322149M/Mies_van_der_Rohe (author
>> "The Museum of Modern Art"), indeed MoMA is in the 110 field (I
>> learned yesterday that field is for corporate entities) and the author
>> is in the "by statement", shown by OL too, as "by Philip C. Johnson."
>> I don't want to accuse anybody, but this leads me to think perhaps
>> ImportBot doesn't know how to import this. Or maybe this record was
>> imported (November 1, 2008) before the decision was made?
>
>
> It looks to me like this record was imported correctly based on the
> algorithm. The "by" statement unfortunately is just free text so no data
> elements are taken from it. Johnson is included as an added author (a 700 in
> the MARC record). A big problem with the library data format is that you
> don't know the relationship of the person listed in the 700 to the item
> being cataloged: it could be the author of a part (like a chapter or intro),
> it could be a co-author, it could be a conductor of a piece of music, etc.
> There are some serious issues with the library data as it exists today, and
> these limit what you can know from the metadata you receive.
>
Oh, I missed the 700 field. But even if the record was imported
correctly, and it looks to me that all the other editions were created
from similar records, how does MoMA end up as the author? Was it
perhaps WorkBot then?

I quite liked this post about free text in MARC fields - and then
noticed you had commented:
http://robotlibrarian.billdueber.com/isbn-parenthetical-notes-bad-marc-data-1/
It mentions the many ways books are described as hardcover or
paperback and that made me wonder why OL wants that description as
free text. The options could be limited, I think. Or does OL
automatically normalize the input to "hardcover", "paperback", "...",
like the example descriptions?

On a side note: I just had a wild idea: don't show the "by statement"
field on the edit form, not even in the librarian mode, if it is
empty, so that no one is tempted to put anything in it.

>>>
>> Searching for authors with "museum" in the name yields 5608 results,
>> many having over 100 works attached :)
>
>
> Yes, these are from an input source that we now regret having imported. The
> input source used the MARC record format, but used it incorrectly. When it
> went through the normal processing, those errors followed through.
>
>
>> The work type doesn't have a contributors/collaborators field at all.
>
>
> No, it isn't supposed to. A work has creators. Contributors are associated
> with expressions and manifestations. This all comes from something called
> "FRBR"
>
Thanks for the pointers. I had already read a few things on FRBR, and
I'll take it from you that Works only have authors/creators. I guess
it makes sense that they do.

It's just that I wasn't sure where the corporate identities should go
in the Open Library if I were to add for example a publication
"authored" by a government (no references to a human author) manually.
The same question arises when I were to edit the example of the MoMA
book about Mies van der Rohe (maybe not the best example, as Philip
Johnson becomes author and MoMA doesn't need to be a contributor).
Can I have an OL Work without author, and put the goverment or MoMA in
a contributor field in the OL Edition(s)? OL Edition contributors are
just saved and treated as strings (which I find a little
dissapointing), so entering a corporate identity there isn't a
problem. The RDF template outputs these contributors as foaf:Persons,
though.

Your proposed field for "responsible organization" could help here.
But should it be part of OL's Work or Edition type?

>>  From a linked data perspective it would be nice if corporate entities
>> (including publishers, although not all publishers are corporate
>> entities) are not just strings, but real entities.
>
>
> Yes, it would be nice, but the data unfortunately doesn't always support it.
> The corporate entities that come in on 110/710 fields are entities in the
> library world and can be found in VIAF with identifiers. The publishers are
> NOT entities, but are a transcription of how the publisher or imprint name
> was presented on the title page of the book. The imprint name \= publisher
> name, so connecting these is difficult. Edward Betts did some
> experimentation around this at one point, but the results were very fuzzy.

If we (users) want it, and if the data model supports it, I think we
can make the data support it, programmatically or manually. I'm from
the Discogs world, in which everything is done manually by people
mostly smart enough to match imprints (a.k.a. labels) to
entities/publishers behind those labels. Even recording, mixing and
mastering studios and the companies behind the labels (mentioned as
copyright holders) are matched. Guidelines and some rules are needed
there, but they seem to work.
Sure, Open Library is not Discogs and the world of books is different
from the world of recorded music (one important reason, I guess, is
that the former is much older), but goals of OL and Discogs are
similar (one page for every book/music record) and the means
(collaborative editing) are too. But I'm not here to just promote
Discogs - I like MusicBrainz very much too ;-)

>>>
>> I would only consider putting a corporate body in the author field if
>> a human author is not mentioned in the book at all, which is very
>> rarely the case.
>
>
> Not so rare, actually. Most documents out of corporations and government
> bodies don't attribute the document to a human. But aside from that, much of
> the data in OL is based on rules used by Anglo-American libraries to make
> these decisions. Many of us could see logic in other ways of doing things.
>
> The difficulty is getting enough consistency to do the merging between
> editions and works. That's one of the reasons why we put corporate bodies in
> collaborator -- how could you possibly explain when a corporate body could
> be an author in a way that folks could easily understand? The rules are over
> a thousand pages long, and if you want to delve into that, here's a zip file
> with the final draft:
>
> http://www.archive.org/details/ResourceDescriptionAccessrdaDraftNov.2008

Just to make sure I understand you correctly: by "collaborator" you
mean "contributor" in the OL Edition? Or a role at the "creator"
level?
I may have a look at the RDA rules (I had signed up for access to the
final rules in the trial period, but was quickly scared away by the
extent of the documents). About authorship: I believe in The
Netherlands by default you don't own the author's rights (~copyright)
for a publication if you wrote it as part of your job - the
organization you work for owns the copyright in that case. So I don't
find it hard to understand the human author of a publication is
hidden. But that may be just me.

>>
>> VIAF could be useful for disambiguation, but there is no obvious way
>> to enter a VIAF ID (or any other URI for a person) in an OL record.
>
>
> This seems to me like a good feature request. Note that VIAF and Wikipedia
> have some mutual linking.
>
The LOD cloud diagram indeed shows an arrow from VIAF to DBPedia. And
the Wikipedia article about J.K. Rowling not only has a link to VIAF,
there is one to OL as well :) This should only make things easier to
link.

I'll open an issue to request allowing for entering VIAF IDs soon.

>> Is
>> this what http://openlibrary.org/type/author/uris is for? That could
>> make Open Library less isolated in the LOD cloud [2]. I have added
>> links to VIAF pages to a few authors, but they are probably in
>> /type/author/links.
>> It seems VIAF has at least 4 entries for the MoMA (New York), by the way.
>
>
> VIAF takes data from about 20 different national library systems and
> clusters the headings for the same entity, where it can. A match on any
> entity in a cluster should be linked to the cluster ID. The MOMA example
> shows the difficulty of creating the clusters algorithmically. I am hoping
> that VIAF will eventually allow human merging of entries.
>
>
>> A (software) agent consuming RDF should not have a hard time figuring
>> out that a foaf:Person in its knowledge base is the same as a
>> foaf:Agent in Open Library, so I don't think this is the best reason
>> to not change to foaf:Agent - not losing specificity is a better
>> reason :)
>
>
> That's if you think that reasoning will be a common feature of RDF software.
> Some folks have doubts.
>
Sindice.com does some simple inferencing: it can add the superclasses
of resources in a search result. Not all RDF software will do
reasoning of course, but I believe in the world of catalogs there will
be useful software that does do it.

Ben

> kc
>
>
>
> --
> Karen Coyle
> [email protected] http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Re: [ol-tech] Improving Open Library RDF output - correction / Types & Schema

Reply via email to