Re: [ol-tech] Improving Open Library RDF output - correction

Karen Coyle Wed, 01 Feb 2012 06:56:21 -0800


On 1/31/12 5:05 PM, Ben Companjen wrote:
> Hi Karen,
>
> Thanks for your comment, but I am not sure about saying "these (few)
> non-persons can be foaf:Persons". This may be an issue that cannot be
> easily fixed in the RDF views though.
> Is there a way in the system (and database) to say Rijksmuseum Twenthe
> is not a person or to specifically say it is an organization?


There isn't now in OL. As I said, the decision was to put non-Person 
creators in contributor. However, that was based on library data, which 
does distinguish between persons and corporate entities. I don't know 
where the corporate entities as creators is coming from - perhaps from 
Amazon? (There also was a batch of library data that didn't make the 
appropriate distinction.) If so, there may not be a way to tell which 
they are. Ross says that there are many corporate bodies in the creator 
field... so I can only imagine that they come in from one of the major 
data sources.

Following the general practice of trying not to lose any specificity, it 
would then be best:
- where persons are identified as persons, keep that information in the 
OL data. There would still be some stray non-persons that are mis-coded 
in the input data.
- where corporate entities are identified as such, code them as such in 
OL, rather than putting them in collaborator, which also gets personal 
names.
- where it isn't possible to distinguish, give them a general "agent" 
designation in OL

This would mean adding some fields to OL. However, I really do think 
that person-hood is an important concept and one worth keeping. It's a 
bit complex because it means having two kinds of "collaborator" - person 
and corporate entity. Perhaps there could be a "responsible 
organization" added to take the known corporate entities?


When
> there is, it is easy to change the RDF template accordingly. Of course
> I can revert foaf:Agent to foaf:Person in my changeset, but in these
> special cases the RDF will be 'wrong'. My proposal may 'just' be less
> precise, as foaf:Person is a subclass of foaf:Agent, unless other
> properties are used that only apply to persons.
> I don't know of any rule / guideline that says organizations should be
> added as collaborators instead of creators, or of a way to enter the
> information this way.

That was built into the automated import process. There was a reluctance 
to provide "rules" for user input to OL, and I think there may have been 
the assumption that few users would consider corporate bodies as 
authors. That's something that is common in library data but that you 
don't see in other environments. When I look up "Rijksmuseum Twenthe" in 
Amazon it does sometimes appear as an author, but only in third-party 
entries, and those use the library heading "Rijksmuseum Twenthe 
(Netherlands)" so I assume the data was copied from library records. 
Amazon itself does not appear to use corporate bodies as authors.

>
> What exactly do you mean by linking?

Linking as in "linked data." Linking to other person information "in the 
cloud," such as the Virtual International Authority File (VIAF) [1] or 
DBPedia. Since OL is a large store of bibliographic data, others may 
want to link to persons in OL to pick up bibliographic data with that 
person as creator. This is more direct if these can be identified using 
'foaf:Person'. I'm not sure that those linking will know to link 
personal names in their data to foaf:Agent in OL.

BTW, linking to VIAF may be a way to disambiguate the corporate bodies 
from the persons, since they will be coded differently in that data.

kc

[1] http://viaf.org

>
> If you have more remarks, I'd like to hear them too. :)
>
> Regards,
>
> Ben
>
> On 31 January 2012 21:20, Karen Coyle<[email protected]>  wrote:
>> I looked again and had remembered wrong: the change was to be between
>> foaf:Person and foaf:Agent. My reply is still the same: Person is what
>> is intended, and Person will get us better linking.
>>
>> kc
>>
>> On 1/30/12 3:55 PM, Ben Companjen wrote:
>>> Hi all,
>>>
>>> I just opened issue 136 on Github, which is a pull request to change
>>> some things in the RDF templates. These things have already been
>>> proposed on this list. One change since my last email: URI references
>>> for Works, Editions and Authors have no trailing "/".
>>> Please see https://github.com/internetarchive/openlibrary/issues/136
>>> for details.
>>>
>>> I had already opened issue 130 about changing the HTTP behavior when
>>> requesting RDF (actually, for any type). In essence it is about using
>>> 303 redirects instead of 301 or 200 depending on the requested
>>> mimetype.
>>> Please see https://github.com/internetarchive/openlibrary/issues/130
>>> for details.
>>>
>>> Regards,
>>>
>>> Ben
>>>
>>> On 13 January 2012 23:02, Ben Companjen<[email protected]>    wrote:
>>>> Summary of text below: it does matter for RDF to have consistent URIs
>>>> (URI with "/" is different from URI without "/"); using URIs with '/'
>>>> for OL resources fits better in current web server configuration.
>>>> There is content negotiation, but in July 2010 it was suggested to do
>>>> it differently (it is still the same) and it is not as 'Accept'ing as
>>>> I would like.
>>>> ----------------------------------------------------------------------------------------------------------------------
>>>>
>>>> On 13 January 2012 01:52, raj kumar<[email protected]>    wrote:
>>>>>
>>>>> On Jan 12, 2012, at 2:18 PM, Karen Coyle wrote:
>>>>>
>>>>>> I don't remember at this point what this was about, presumably a
>>>>>> comment that came in on the list. It MAY have been a reference to the
>>>>>> URI/Ls in the namespace section of the XML. Since we can't seem to
>>>>>> find a problem, I'd say we should ignore it. If it matters, it'll come
>>>>>> up again.
>>>>>
>>>>> Ah, I remember... Yes, this is specific to Namespace URIs in RDF.
>>>>>
>>>>> Namespace URIs should end in either / or #, so that RDF URI Refs can be 
>>>>> constructed by concatenation of the Namespace URI and a local name 
>>>>> without adding any separators.
>>>>>
>>>>> This concat operation is defined here: 
>>>>> http://www.w3.org/TR/REC-rdf-syntax/
>>>>>
>>>>> Ben, can you send a pull request with only the change of appending the 
>>>>> trailing '/'?
>>>>>
>>>>> Thanks!
>>>>> -raj
>>>>>
>>>> Hi Raj,
>>>>
>>>> I'm not sure I understand you here, seeing that all namespace URIs
>>>> already have a trailing / or #. I was talking about other 'URI
>>>> references', the URIs used to identify and make statements about
>>>> resources (Works, Editions and Authors). These are not prefixes, but
>>>> complete URIs.
>>>>
>>>> I looked through the URI RFC<http://www.ietf.org/rfc/rfc2396.txt>,
>>>> but found no concrete information that
>>>> <http://openlibrary.org/books/OL18215289M/>    and
>>>> <http://openlibrary.org/books/OL18215289M>    should be interpreted as
>>>> being equivalent (or that they shouldn't).
>>>> Following Lee's analogy: I don't know if "Ben/" should be interpreted
>>>> by RDF agents as being equivalent to "Ben". If they should, in theory
>>>> the discussion about URI references with or without trailing slashes
>>>> is irrelevant, even though I'd say (if they were references to me) I'd
>>>> prefer "Ben". :)
>>>> People 
>>>> on<http://answers.semanticweb.com/questions/13827/is-a-uri-with-trailing-different-from-uri-without>
>>>> say the identifiers are different. So it is important to be consistent
>>>> in assigning URIs to resources. RDF applications who use the current
>>>> OL RDF output will not be able to link the data from the Work RDF
>>>> (URIs with /) to the data from the Edition RDF (URIs without /).
>>>>
>>>> Following the linked data principles, it would be very nice if one can
>>>> look up the URI of something and get it (e.g. electronic documents) or
>>>> get more information about it (e.g. information about a person). Since
>>>> the resources in the Open Library cannot be transferred via HTTP, we
>>>> only want information about the resources. Redirecting requests for a
>>>> non-information resource using HTTP 303 to its HTML, RDF, JSON etc
>>>> representation, based on the Accept header in the request, is common.
>>>> But you probably know this already.
>>>> Using Wireshark I could see the redirect process happening. When I ask
>>>> for<x/>    I get a "303 See Other" to<x>, then when I ask for<x>    I get
>>>> "301 Permanently moved" to<x/Title>. This is almost what the Cool
>>>> URIs document describes, only content negotiation is not used - as I
>>>> wrote in a previous email, I am always redirected to the HTML
>>>> representation.
>>>>
>>>> At this point in writing this email, I searched the list archives for
>>>> "http 303" and found a two-message conversation from July 2010 about
>>>> using 303 redirects to<x.rdf>    instead of returning RDF/XML when
>>>> requesting<x>. Err, content negotiation is already implemented? So I
>>>> tried "Accept: application/rdf+xml" in a request for<x/>    and was
>>>> redirected to<x>    and served RDF/XML. D'oh!
>>>>
>>>> That changes what I wanted to say. The mail conversation starts at
>>>> <http://www.mail-archive.com/[email protected]/msg00198.html>. Ross
>>>> Singer replies in
>>>> <http://www.mail-archive.com/[email protected]/msg00199.html>    that
>>>> redirecting to<x.rdf>    seems doable. I agree and would like this to be
>>>> implemented.
>>>> I would like to add that better handling of the Accept header would be
>>>> welcome. If I prefer Turtle (q=1) but like RDF/XML almost the same
>>>> (q=0.9), I'm redirected to the HTML with 301 permanently moved.
>>>>
>>>> Finally, getting back to the trailing slashes: although I don't like
>>>> the aesthetics of a trailing /, with this insights, it may be easier
>>>> make all URI references (to OL resources) in the RDF end with a "/".
>>>>
>>>> Apologies for this long answer - I hope I made myself clear, though. :)
>>>>
>>>> Regards,
>>>>
>>>> Ben
>>> _______________________________________________
>>> Ol-tech mailing list
>>> [email protected]
>>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>>> To unsubscribe from this mailing list, send email to 
>>> [email protected]
>>
>> --
>> Karen Coyle
>> [email protected] http://kcoyle.net
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet
>> _______________________________________________
>> Ol-tech mailing list
>> [email protected]
>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>> To unsubscribe from this mailing list, send email to 
>> [email protected]

-- 
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Re: [ol-tech] Improving Open Library RDF output - correction

Reply via email to