Hi Thomas and Patrick! Thank you both for bringing the discussion forward. I must admit that I'm having some problems following here. I read your mails multiple times, really trying to understand your demands. After reading this [1], I hope I'm getting closer.
I just want to sum up what I think I've understood so far. Please correct me if I'm wrong.. -- When it comes to cataloging based delimiters (punctuation), there is some inner semantic to the content of the subfields. E.g. "=$b" in field 245 means something different than ":$b". -- There may be data you want to get at whole, which spread over multiple subfields. This information is cannot be described by the range of subfields, but with the closure through punctuation. E.g. in the field 245 00$aHeritage Books archives.$pUnderwood biographical dictionary.$nVolumes 1 & 2 revised$h[electronic resource] /$cLaverne Galeener-Moore. the data you want to get is Heritage Books archives. Underwood biographical dictionary. Volumes 1 & 2 revised [electronic resource] Is this what you mean when want to say something like "Get me all from field XXX until you hit Y"? I guess so. -- Therefore the order of subfields is crucial. While MARCspec allows subfields stated in any order, a result should preserve the subfield order emerging in the field. -- Some fields are linked through specific subfields. There may be some data you want to get dependent on linkage from other fields. I'm not sure if I have an example for this. Maybe you could provide one. Finally I've found a nice example on the MARC21 website [2] (section $i - Relationship information). That my question is, if you want to achieve something like this: Source: 100 1# $aVerdi, Giuseppe, $d1813-1901. 245 10 $aOtello :$bin full score /$cGiuseppe Verdi. 700 1# $iLibretto based on (work) $aShakespeare, William, $d1564-1616. $tOthello. 787 08 $ireproduction of (manifestation) $aVerdi, Giuseppe, 1813-1901. $tOtello.$d Milano: Ricordi, c1913 Result (user display): Verdi, Giuseppe, 1813-1901. Otello : in full score / Giuseppe Verdi Reproduction of Verdi, Giuseppe, 1813-1901. Otello. Milano : Ricordi, c1913 Libretto based on Shakespeare, William, 1564-1616. Othello. Is this something you want to express within a MARCspec? Anyhow a collection of use cases is a great idea. That would help to discover the tasks a MARCspec should cope. But I really need your help here. Maybe a wider audience would also be helpful? Cheers! Carsten [1] <http://marc-must-die.info/index.php?title=MARC_issues> [2] <http://www.loc.gov/marc/bibliographic/bd76x78x.html> _______________________________________________ Carsten Klee Abt. Überregionale Bibliographische Dienste IIE Staatsbibliothek zu Berlin - Preußischer Kulturbesitz Fon: +49 30 266-43 44 02 > -----Ursprüngliche Nachricht----- > Von: Thomas Berger [mailto:t...@gymel.com] > Gesendet: Mittwoch, 19. Februar 2014 23:06 > An: Klee, Carsten; 'Patrick Hochstenbach' > Cc: v...@gbv.de; librecat-...@mail.librecat.org; perl4lib@perl.org > Betreff: Re: [librecat-dev] A common MARC record path language > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi Carsten, > > > I think the whole problem lies in the limited expressivity of strings. > > MARCspec is pretty much close to XPath at its approach, but without > regular > > expressions and functions like first(), last() etc. But even with XPath > it would > > be pretty hard to get the character before a subfield in a MARCXML file. > > > > The only solution I can think of, is using regular expressions. And I'm > not > > convinced that bringing this into MARCspec is a good idea. As I already > > mentioned in the spec, MARCspec is not independent from the application > using > > MARCspec. Taking regular expressions into MARCspec wouldn't make the > application > > more usable, but would blow up the specification. > > Agreed, therefore regular expressions or other /general/ mechanisms > should not the way to go (for specifying MARCspecs - specific > implementations > may realize it using a regexp implementation at hand) > > Thus, yes, limited expressivness of strings demands to make the most > typical and most important "operations" on MARC records to be > expressible. But if it's too limited (say it could only extract fields > or has blind spots - parts of record data which cannot be accessed at all) > it wouldn't be of any use. > > Thus MARCspec's need a convincing approach to the peculiarities of MARC > records: > > Subfields are not always data elements in a proper sense, sometimes > they are just marks interspersed into the field content. > > And as Patrick pointed out there is the presence of non-MARC delimiters > (markup) which is crucial for processing of some (sub)fields. > > Many fields contain "ensembles" of subfields with one nature, accompanied > by other, more data-like subfields of a different nature: > > - - Most subfields in 700 are a simple copy of some (hypothetical) > authority > record's 100, however $e and/or $4 denote the function of that person > with > respect to the work described by the record at hand - and repeatable > $0's > just are complimentary to the "core" subfields which well may be > $a,$b,$c,$d, > $f,$g,$j,$k,$l,$n,$p,$q,$t and $u (some of them repeatable and don't > even > dare to change anything in their order). Use cases might include > /selection/ > based on one or more of the more data-like subfields and /reduction/ of > the > field to a form suitable for further proessing (indexing without $e, > display > including $e, or with deviant formatting of $e with reverence to today's > slighly silly discussion on AUTOCAT concerning photographers acting as > authors > and authors acting as photographers to the perplexion of patrons ...). > > - - Same issue with most fields 77X: most subfields pertain to the work, > some are the individual "coordinates" within this work for that part > described by the given record > > - - The 245 example (and also the $e in 100's) may demonstrate a need to > /partition/ a field at certain spots - maybe before or after subfields > meeting some content condition. > > - - Ubiquitous (in the specification, maybe not in the "field") are $6 and > $8's. If MARCspec's could make thusly interwoven fields accessible > as ensembles - that would be an enormous benefit! > > - From my limited experience the "unclear" nature of subfields really is > the > hard part in MARC processing: If you delve into subfield processing too > early you get data fragments almost or completely impossible to reassemble > into something meaningful. On the other hand looking at fields as a whole > gives you more chances to understand what it is about but you're going > to choke on the weeding out necessary to proceed. > > Thus maybe due to my limited experience in MARC processing I'd very much > appreciate MARCspec as a grammar to formulate those tasks that really > matter (and are hard to be done 100% right). To achieve that - cf. > Patrick's reply again - one or several "processing paradigms" for MARC > records should serve as a base and - for clarities' sake - should be made > explicit in the MARCspec specification. > > Thomas > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1 > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iJwEAQECAAYFAlMFKsEACgkQYhMlmJ6W47M05wP/WcjpFrIXlOI/y21kxcYc+XDH > QHT/8QypD6yKqHM8c7KzcHB8efttB7CQ8mB7cAtqxqQw2oqPzicnkYXIJU9Z9Yxm > yIaJXPWKovgypLNn4sAjPf2/MsJMYTtCrLOGwWxgp+Uq8bvAuZx5iMr1rKP68PzH > DCGkPq31KhMT1tUBHMk= > =EP69 > -----END PGP SIGNATURE-----