Re: [ol-tech] Improving Open Library RDF output, part 2

Tom Morris Thu, 29 Mar 2012 08:45:02 -0700

On Wed, Mar 28, 2012 at 3:45 AM, Karen Coyle <[email protected]> wrote:
> On 3/27/12 10:06 PM, Ben Companjen wrote:
>
> This is the beginning of the discussion about RWO's vs. bibliographic
> entities:
>
> http://www.mail-archive.com/[email protected]/msg00076.html
>
> It's an ontological question - what is the entity that is being modeled?


I think you need to model both the author and the bibliographic entry
for the author, but the vast majority of the data will be associated
with the author.  It's only stuff like the "last updated" field which
will be associated with the record.

>>> As for using the VIAF ID rather than the individual ID, I'm not entirely
>>> sure about that. As VIAF grows, individual library authority identifiers
>>> can move from one cluster to another. The VIAF id identifies the
>>> cluster, not the individual heading.
>>> The cluster itself does not have a string to match against.
>>
>> I'm not entirely sure what you mean - my understanding of VIAF is that
>> a VIAF ID identifies a person and that the underlying database
>> connects IDs from the individual authority files. I don't know whether
>> VIAF IDs are reused when one becomes obsolete (e.g. after a merge).
>
> We may need to ask, but my understanding is that the VIAF ID identifies
> a cluster of name authority statements that are considered to be for the
> same entity. However, if you look at VIAF you often see more than one
> cluster for the same entity. Presumably these will eventually be
> resolved. The resolution, as I understand it, will be to re-cluster the
> individual name authority entries. I do not know if the previous VIAF ID
> will be redirected to the new cluster. But I am pretty sure that the
> matching actions take place on the individual name authority records,
> not on the clusters.

You're always going to have this issue when cross-referencing among
multiple evolving databases.  Merges are the easy case.  You just have
both identifiers resolve to the same record (redirects, owl:sameAs,
etc).  The difficult case is handling splits of things which were
wrongly merged in the first place.  If you have authors A, B, & C
where C is actually a mistaken conflation of A & B (ie there are
really only two authors), you have no choice but to move the
appropriate books from C to each of A & B and then kill the record for
C, perhaps replacing it with set of links to the topics that the data
moved to (Freebase does this by having a "split_to" property that
points to the new topics).  There's really no way an application can
transparently make use of this in the same way that it could a
redirect or an owl:sameAs assertion.

> The primary issue that I see, however, is that there is no preferred
> form of the name for a cluster - the cluster keeps as preferred forms
> all of the preferred forms from all of the clustered name authority
> records. So at the point that you need to either compare a string to
> something in VIAF, or make use of VIAF, you must operate on the records
> in the cluster - there is no VIAF name data as such. As an ID, VIAF
> identifies a cluster of declarations about a named entity, and those
> declarations can have significant differences.

I don't see the lack of a "preferred" name to be an issue.  Who
prefers that name anyway?  The answer is typically some librarian
somewhere, not real world people.  Plus, the "name" isn't the author's
name, but the name of the bibliographic record with a bunch of other
random data tossed in like birth dates, pseudonym keywords, etc.

The best name is also culture and language dependent, so given that
one has the all the preferred names from the national libraries, one
can use the user's preferences for language to generate a name by:
 - mapping user's language to national library
 - finding the preferred name for that national library
 - using that national library's rules for forming an "authorized"
name to get rid of all the cruft and get a real name
 - applying fallback rules when there's no entry for the language or
national library

A strategy like that allows you to support multiple languages and
cultures.  Of course all the mappings and transformations can be
precomputed for efficiency.

Tom
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Re: [ol-tech] Improving Open Library RDF output, part 2

Reply via email to