Re: [RDA-L] Linked data

Jonathan Rochkind Wed, 19 Jan 2011 10:44:33 -0800

Okay, again, give me the algorithm for software to use to figure outwhat title(s) to use to display to the user (assuming we don't just wantto put out the whole 245 with ISBD punctuation and all).

How does software know when to get exactly what from a 240 or 740 , andwhen to use it as a title label? (A 240 is often useless as a titlelabel, eg "Selections", that is NOT in fact the title of the work it'sattached to, to any user at all; A work cited in a 740 may or may notactually be the work in the record at hand, it could also be a 'related'work in some way).

I have studied cataloging, rather extensively, because it's interesting,and because I want to make my software use the bibliographic data aswell as possible. I took all the cataloging classes there were inlibrary school. I talk to catalogers regularly. I read catalogingjournals like CCQ and cataloging listservs like this one and catalogingblogs.

And I've spent quite a bit of time trying to figure out how to getthings like this out of our actual AACR2/MARC. It is not for lack oftrying or experience or talking to catalogers that I conclude that manythings that catalogers DO spend very expensive expert time encoding in arecord in fact can NOT be reliably or simply extractedalgorithmically. Yeah, you can come up with very complicated rules(very complicated rules means much more expensive to implement) thatwill _mostly_ work, and need to be constantly tweaked and enhanced(again, read 'expensive', in staff time, writing software costs money indeveloper time) as new examples are found that it doesn't work for. Sure.

But the original reason I started this sub-thread was to argue that THISis the real problem with our data, and one that MarcXML does not touch.

Now, if you believe that it is not possible or feasible to create datadescribing bibliographic records that does NOT suffer from problems likethis, that bibliographic items are inherently so complicated that it isnot POSSIBLE to create data that is actually easily used bydevelopers... well, then we can just disagree on that. But non-librarydevelopers are not going to be eager to use such data just because it'sin XML.


On 1/19/2011 1:28 PM, Weinheimer Jim wrote:

I guess I didn't make myself clear. When there are different titles listed in 
the 245, there is *absolutely no reason whatsoever* why a computer would have 
to extract those titles automatically since the cataloging rules make very 
clear that they are supposed to be traced in separate 240, 246, 7xx$a$t and 740 
fields (off the top of my head, probably not an exhaustive list). The parsing 
has already been done and the computer work is superfluous. A cataloger knows 
this, while a non-cataloger does not. There is no algorithm needed since the 
cataloger has done all the work manually.

In the example in the article, I found the record the author mentioned and showed how the 
cataloger had "parsed" it all manually.

Where is the problem with this? Why parse something that doesn't need it? Why 
not use the power of the entirety of the records? It seems that from your post, 
you are maintaining that a title in a 740 or 246 field, or in a 700$a$t field 
is not usable?

James L. Weinheimer  [email protected]
Director of Library and Information Services
The American University of Rome
Rome, Italy
First Thus: http://catalogingmatters.blogspot.com/
________________________________________
From: Jonathan Rochkind [[email protected]]
Sent: Wednesday, January 19, 2011 7:03 PM
To: Resource Description and Access / Resource Description and Access
Cc: Weinheimer Jim
Subject: Re: [RDA-L] Linked data

Again, as someone who "knows cataloing rules", if there's an algorithm
you can give me that will let me extract the individual elements (actual
transcribed title vs analytical titles vs parallel titles vs statement
of responsibility) reliably from correct AACR2 MARC, please let me know
what it is.

I am fairly certain there is no such algorithm that is reliable.

I guess you could say that there's no reason to _expect_ that you should
be able to get those elements out of a data record.  But most
developers, library or not, will consider bibliographic data that you
can't reliably extract the title of the item (a pretty basic attribute,
just about the most basic attribute there is) from to be pretty
low-value data.  They won't change their opinion if you show them the
record serialized in MarcXML instead of ISO Marc21.

All that you get by being an expert in the data is the knowledge that
you _can't_ really reliably algorithmically extract the transcribed
title alone from any arbitrary 245 of Marc/AACR2.   It'll work for the
basic cases, but once you start putting in parallel titles, analytics,
and parallel titles of analytical titles, it's a big mess -- and such
complicated cases (which are rare in general but common in some domains
like music records) are also the ones where the cataloger is most likely
to have gotten the punctuation not EXACTLY right, making it even more
hopeless, even if the programmer did want to write an incredibly
complicated algorithm that tried to take into account the combination of
ISBD punctuation with marc subfields.

Yes, "many of these issues have been known from the beginning and dealt
with in various ways." That doesn't make the data easily useable by
developers, whether you put in MarcXML or not. Those "various ways", if
we're talking about software trying to extract elements from bib
records,  are expensive (in developer time) and fragile (they still
won't work all the time) hacks.

On 1/19/2011 12:52 PM, Weinheimer Jim wrote:

Jonathan Rochkind wrote:

Concerning: ">   "One example of this can be found reported in this article: 
http://journal.code4lib.org/articles/3832";
<snip>
Okay, what would someone who "knows library metadata" do to get a
displayable title out of records in an arbitrary corpus of MARC data?
There's an easy answer that only those who know library metadata
(apparently unlike people like Thomale or me who have been working with
it for years) can provide?  I have my doubts.
</snip>

I agree that this is an excellent article that everyone should read, but I 
wrote a comment myself there (no. 7) discussing how this article illustrates 
how important it is to know cataloging rules and/or to work closely with 
experienced catalogers when building something like this. It also shows how 
many programmers concentrate on certain parts of a record and tend to ignore 
the overall view, while catalogers concentrate on whole records.

In this case, the parsing is *always* done manually by the cataloger, who is 
directed to make title added entries, along with uniform titles, including the 
authors--that is, so long as the cataloger is competent and following the 
rules. So, it is always a mistake to concentrate only on a single field since a 
record must be must be considered in its entirety.  It would be unrealistic for 
systems people to know these intricacies, but it just shows how important it is 
that they work closely with catalogers.

Therefore, it's not *necessarily* arbitrary. Many of these issues have been 
known since the very beginnings and have been dealt with in various ways.

James L. Weinheimer  [email protected]
Director of Library and Information Services
The American University of Rome
Rome, Italy
First Thus: http://catalogingmatters.blogspot.com/

Re: [RDA-L] Linked data

Reply via email to