It's a good article, but also a bit disingenuous. Much more was being
asked for than just a "displayable title," as the author's
dissatisfaction with the initial results makes clear. It would help to
have the full list of expectations stated up front, to make clear that
what is being asked for is itself fairly complex:
A displayable title
A recognizable title ("Selections" is not enough, but neither is
"Symphony no. 3")
A title which represents the full contents of the object
A title which makes clear the semantic relationships of its several elements
That's a taller order than what the wording of the article suggests.
To demand beyond that than any algorithm proposed for extracting such
a complex piece of data from MARC should be reliable across the vast
sea of catalog records with all their acknowledged variability is just
silly.
MARC is complex, the cataloging rules are complex, and the objects
they seek to represent are, in many, many cases, complex. Simple
approaches to any of these won't work, unless the bar for what's
expected is set very low.
Stephen
On Wed, Jan 19, 2011 at 12:03 PM, Jonathan Rochkind <[email protected]> wrote:
> Again, as someone who "knows cataloing rules", if there's an algorithm you
> can give me that will let me extract the individual elements (actual
> transcribed title vs analytical titles vs parallel titles vs statement of
> responsibility) reliably from correct AACR2 MARC, please let me know what it
> is.
>
> I am fairly certain there is no such algorithm that is reliable.
>
> I guess you could say that there's no reason to _expect_ that you should be
> able to get those elements out of a data record. But most developers,
> library or not, will consider bibliographic data that you can't reliably
> extract the title of the item (a pretty basic attribute, just about the most
> basic attribute there is) from to be pretty low-value data. They won't
> change their opinion if you show them the record serialized in MarcXML
> instead of ISO Marc21.
>
> All that you get by being an expert in the data is the knowledge that you
> _can't_ really reliably algorithmically extract the transcribed title alone
> from any arbitrary 245 of Marc/AACR2. It'll work for the basic cases, but
> once you start putting in parallel titles, analytics, and parallel titles of
> analytical titles, it's a big mess -- and such complicated cases (which are
> rare in general but common in some domains like music records) are also the
> ones where the cataloger is most likely to have gotten the punctuation not
> EXACTLY right, making it even more hopeless, even if the programmer did want
> to write an incredibly complicated algorithm that tried to take into account
> the combination of ISBD punctuation with marc subfields.
>
> Yes, "many of these issues have been known from the beginning and dealt with
> in various ways." That doesn't make the data easily useable by developers,
> whether you put in MarcXML or not. Those "various ways", if we're talking
> about software trying to extract elements from bib records, are expensive
> (in developer time) and fragile (they still won't work all the time) hacks.
>
> On 1/19/2011 12:52 PM, Weinheimer Jim wrote:
>>
>> Jonathan Rochkind wrote:
>>
>> Concerning: "> "One example of this can be found reported in this
>> article: http://journal.code4lib.org/articles/3832"
>> <snip>
>> Okay, what would someone who "knows library metadata" do to get a
>> displayable title out of records in an arbitrary corpus of MARC data?
>> There's an easy answer that only those who know library metadata
>> (apparently unlike people like Thomale or me who have been working with
>> it for years) can provide? I have my doubts.
>> </snip>
>>
>> I agree that this is an excellent article that everyone should read, but I
>> wrote a comment myself there (no. 7) discussing how this article illustrates
>> how important it is to know cataloging rules and/or to work closely with
>> experienced catalogers when building something like this. It also shows how
>> many programmers concentrate on certain parts of a record and tend to ignore
>> the overall view, while catalogers concentrate on whole records.
>>
>> In this case, the parsing is *always* done manually by the cataloger, who
>> is directed to make title added entries, along with uniform titles,
>> including the authors--that is, so long as the cataloger is competent and
>> following the rules. So, it is always a mistake to concentrate only on a
>> single field since a record must be must be considered in its entirety. It
>> would be unrealistic for systems people to know these intricacies, but it
>> just shows how important it is that they work closely with catalogers.
>>
>> Therefore, it's not *necessarily* arbitrary. Many of these issues have
>> been known since the very beginnings and have been dealt with in various
>> ways.
>>
>> James L. Weinheimer [email protected]
>> Director of Library and Information Services
>> The American University of Rome
>> Rome, Italy
>> First Thus: http://catalogingmatters.blogspot.com/
>
--
Stephen Hearn, Metadata Strategist
Technical Services, University Libraries
University of Minnesota
160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
Ph: 612-625-2328
Fx: 612-625-3428