On Wed, Mar 28, 2012 at 3:45 AM, Karen Coyle <[email protected]> wrote: > On 3/27/12 10:06 PM, Ben Companjen wrote: > > This is the beginning of the discussion about RWO's vs. bibliographic > entities: > > http://www.mail-archive.com/[email protected]/msg00076.html > > It's an ontological question - what is the entity that is being modeled?
I think you need to model both the author and the bibliographic entry for the author, but the vast majority of the data will be associated with the author. It's only stuff like the "last updated" field which will be associated with the record. >>> As for using the VIAF ID rather than the individual ID, I'm not entirely >>> sure about that. As VIAF grows, individual library authority identifiers >>> can move from one cluster to another. The VIAF id identifies the >>> cluster, not the individual heading. >>> The cluster itself does not have a string to match against. >> >> I'm not entirely sure what you mean - my understanding of VIAF is that >> a VIAF ID identifies a person and that the underlying database >> connects IDs from the individual authority files. I don't know whether >> VIAF IDs are reused when one becomes obsolete (e.g. after a merge). > > We may need to ask, but my understanding is that the VIAF ID identifies > a cluster of name authority statements that are considered to be for the > same entity. However, if you look at VIAF you often see more than one > cluster for the same entity. Presumably these will eventually be > resolved. The resolution, as I understand it, will be to re-cluster the > individual name authority entries. I do not know if the previous VIAF ID > will be redirected to the new cluster. But I am pretty sure that the > matching actions take place on the individual name authority records, > not on the clusters. You're always going to have this issue when cross-referencing among multiple evolving databases. Merges are the easy case. You just have both identifiers resolve to the same record (redirects, owl:sameAs, etc). The difficult case is handling splits of things which were wrongly merged in the first place. If you have authors A, B, & C where C is actually a mistaken conflation of A & B (ie there are really only two authors), you have no choice but to move the appropriate books from C to each of A & B and then kill the record for C, perhaps replacing it with set of links to the topics that the data moved to (Freebase does this by having a "split_to" property that points to the new topics). There's really no way an application can transparently make use of this in the same way that it could a redirect or an owl:sameAs assertion. > The primary issue that I see, however, is that there is no preferred > form of the name for a cluster - the cluster keeps as preferred forms > all of the preferred forms from all of the clustered name authority > records. So at the point that you need to either compare a string to > something in VIAF, or make use of VIAF, you must operate on the records > in the cluster - there is no VIAF name data as such. As an ID, VIAF > identifies a cluster of declarations about a named entity, and those > declarations can have significant differences. I don't see the lack of a "preferred" name to be an issue. Who prefers that name anyway? The answer is typically some librarian somewhere, not real world people. Plus, the "name" isn't the author's name, but the name of the bibliographic record with a bunch of other random data tossed in like birth dates, pseudonym keywords, etc. The best name is also culture and language dependent, so given that one has the all the preferred names from the national libraries, one can use the user's preferences for language to generate a name by: - mapping user's language to national library - finding the preferred name for that national library - using that national library's rules for forming an "authorized" name to get rid of all the cruft and get a real name - applying fallback rules when there's no entry for the language or national library A strategy like that allows you to support multiple languages and cultures. Of course all the mappings and transformations can be precomputed for efficiency. Tom _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
