AW: [librecat-dev] A common MARC record path language

Klee, Carsten Sun, 23 Feb 2014 23:41:30 -0800

Hi Thomas and Patrick!

Thank you both for bringing the discussion forward. I must admit that I'm 
having some problems following here. I read your mails multiple times, really 
trying to understand your demands. After reading this [1], I hope I'm getting 
closer.


I just want to sum up what I think I've understood so far. Please correct me if 
I'm wrong..

--  When it comes to cataloging based delimiters (punctuation), there is some 
inner semantic to the content of the subfields. E.g. "=$b" in field 245 means 
something different than ":$b".

-- There may be data you want to get at whole, which spread over multiple 
subfields. This information is cannot be described by the range of subfields, 
but with the closure through punctuation. E.g. in the field

245     00$aHeritage Books archives.$pUnderwood biographical 
dictionary.$nVolumes 1 & 2 revised$h[electronic resource] /$cLaverne 
Galeener-Moore.

the data you want to get is

Heritage Books archives. Underwood biographical dictionary. Volumes 1 & 2 
revised [electronic resource]

Is this what you mean when want to say something like "Get me all from field 
XXX until you hit Y"? I guess so.

-- Therefore the order of subfields is crucial. While MARCspec allows subfields 
stated in any order, a result should preserve the subfield order emerging in 
the field.

-- Some fields are linked through specific subfields. There may be some data 
you want to get dependent on linkage from other fields. I'm not sure if I have 
an example for this. Maybe you could provide one.

Finally I've found a nice example on the MARC21 website [2] (section $i - 
Relationship information). That my question is, if you want to achieve 
something like this:

Source:
100  1# $aVerdi, Giuseppe, $d1813-1901.
245  10 $aOtello :$bin full score /$cGiuseppe Verdi.
700  1# $iLibretto based on (work) $aShakespeare, William, $d1564-1616. 
$tOthello.
787  08 $ireproduction of (manifestation) $aVerdi, Giuseppe, 1813-1901. 
$tOtello.$d Milano: Ricordi, c1913

Result (user display):
Verdi, Giuseppe, 1813-1901. Otello : in full score / Giuseppe Verdi
Reproduction of Verdi, Giuseppe, 1813-1901. Otello. Milano : Ricordi, c1913
Libretto based on Shakespeare, William, 1564-1616. Othello.

Is this something you want to express within a MARCspec?

Anyhow a collection of use cases is a great idea. That would help to discover 
the tasks a MARCspec should cope. But I really need your help here. Maybe a 
wider audience would also be helpful?
Cheers!

Carsten

[1] <http://marc-must-die.info/index.php?title=MARC_issues>
[2] <http://www.loc.gov/marc/bibliographic/bd76x78x.html>
_______________________________________________
Carsten Klee
Abt. Überregionale Bibliographische Dienste IIE
Staatsbibliothek zu Berlin - Preußischer Kulturbesitz

Fon:  +49 30 266-43 44 02

> -----Ursprüngliche Nachricht-----
> Von: Thomas Berger [mailto:[email protected]]
> Gesendet: Mittwoch, 19. Februar 2014 23:06
> An: Klee, Carsten; 'Patrick Hochstenbach'
> Cc: [email protected]; [email protected]; [email protected]
> Betreff: Re: [librecat-dev] A common MARC record path language
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi Carsten,
> 
> > I think the whole problem lies in the limited expressivity of strings.
> > MARCspec is pretty much close to XPath at its approach, but without
> regular
> > expressions and functions like first(), last() etc. But even with XPath
> it would
> > be pretty hard to get the character before a subfield in a MARCXML file.
> >
> > The only solution I can think of, is using regular expressions. And I'm
> not
> > convinced that bringing this into MARCspec is a good idea. As I already
> > mentioned in the spec, MARCspec is not independent from the application
> using
> > MARCspec. Taking regular expressions into MARCspec wouldn't make the
> application
> > more usable, but would blow up the specification.
> 
> Agreed, therefore regular expressions or other /general/ mechanisms
> should not the way to go (for specifying MARCspecs - specific
> implementations
> may realize it using a regexp implementation at hand)
> 
> Thus, yes, limited expressivness of strings demands to make the most
> typical and most important "operations" on MARC records to be
> expressible. But if it's too limited (say it could only extract fields
> or has blind spots - parts of record data which cannot be accessed at all)
> it wouldn't be of any use.
> 
> Thus MARCspec's need a convincing approach to the peculiarities of MARC
> records:
> 
> Subfields are not always data elements in a proper sense, sometimes
> they are just marks interspersed into the field content.
> 
> And as Patrick pointed out there is the presence of non-MARC delimiters
> (markup) which is crucial for processing of some (sub)fields.
> 
> Many fields contain "ensembles" of subfields with one nature, accompanied
> by other, more data-like subfields of a different nature:
> 
> - - Most subfields in 700 are a simple copy of some (hypothetical)
> authority
>   record's 100, however $e and/or $4 denote the function of that person
> with
>   respect to the work described by the record at hand - and repeatable
> $0's
>   just are complimentary to the "core" subfields which well may be
> $a,$b,$c,$d,
>   $f,$g,$j,$k,$l,$n,$p,$q,$t and $u (some of them repeatable and don't
> even
>   dare to change anything in their order). Use cases might include
> /selection/
>   based on one or more of the more data-like subfields and /reduction/ of
> the
>   field to a form suitable for further proessing (indexing without $e,
> display
>   including $e, or with deviant formatting of $e with reverence to today's
>   slighly silly discussion on AUTOCAT concerning photographers acting as
> authors
>   and authors acting as photographers to the perplexion of patrons ...).
> 
> - - Same issue with most fields 77X: most subfields pertain to the work,
>   some are the individual "coordinates" within this work for that part
>   described by the given record
> 
> - - The 245 example (and also the $e in 100's) may demonstrate a need to
>   /partition/ a field at certain spots - maybe before or after subfields
>   meeting some content condition.
> 
> - - Ubiquitous (in the specification, maybe not in the "field") are $6 and
>   $8's. If MARCspec's could make thusly interwoven fields accessible
>   as ensembles - that would be an enormous benefit!
> 
> - From my limited experience the "unclear" nature of subfields really is
> the
> hard part in MARC processing: If you delve into subfield processing too
> early you get data fragments almost or completely impossible to reassemble
> into something meaningful. On the other hand looking at fields as a whole
> gives you more chances to understand what it is about but you're going
> to choke on the weeding out necessary to proceed.
> 
> Thus maybe due to my limited experience in MARC processing I'd very much
> appreciate MARCspec as a grammar to formulate those tasks that really
> matter (and are hard to be done 100% right). To achieve that - cf.
> Patrick's reply again - one or several "processing paradigms" for MARC
> records should serve as a base and - for clarities' sake - should be made
> explicit in the MARCspec specification.
> 
> Thomas
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iJwEAQECAAYFAlMFKsEACgkQYhMlmJ6W47M05wP/WcjpFrIXlOI/y21kxcYc+XDH
> QHT/8QypD6yKqHM8c7KzcHB8efttB7CQ8mB7cAtqxqQw2oqPzicnkYXIJU9Z9Yxm
> yIaJXPWKovgypLNn4sAjPf2/MsJMYTtCrLOGwWxgp+Uq8bvAuZx5iMr1rKP68PzH
> DCGkPq31KhMT1tUBHMk=
> =EP69
> -----END PGP SIGNATURE-----

AW: [librecat-dev] A common MARC record path language

Reply via email to