Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread Mike Taylor
Having read the rest of this thread, I find that nothing that's been
said changes my initial gut reaction on reading this question: DO NOT
USE DCTERMS.  It's vocabulary is Just Plain Inadequate, and not only
for esoteric cases like the Alternative Chronological Designation of
First Issue or Part of Sequence field that Karen mentioned.  Despite
having 70 (seventy!) elements, it's lacking fundamental fields for
describing articles in journals -- there are no journalTitle, volume,
issue, startPage or endPage fields.  That, for me, is a deal-breaker.

(For anyone who wonders: MODS does have a way to represent these
elements, although they are unnecessarily complicated as the example
at
http://www.loc.gov/standards/mods/v3/modsjournal.xml
shows.)

For anyone who enjoys weeping freely, I recommend the document
Guidelines for Encoding Bibliographic Citation Information in Dublin
Core Metadata, available at
http://dublincore.org/documents/dc-citation-guidelines/index.shtml





On 28 April 2010 17:56, MJ Suhonos m...@suhonos.ca wrote:
 Hi all,

 I'm digging into earlier threads on Code4Lib and NGC4lib and trying to get 
 some concrete examples around the DCTERMS element set — maybe I haven't been 
 a subscriber for long enough.

 What I'm looking for in particular are things I can work with *in 
 code/implementation*, most notably:

 - does there exist a MODS-to-DCTERMS (or vice-versa) crosswalk anywhere?  I 
 see one for collections: 
 http://www.loc.gov/standards/mods/v3/mods-collection-description.html and 
 http://www.loc.gov/marc/marc2dc.html for MARC but my ideal use case is, eg. 
 an XSLT to turn a MODS document into an XML-encoded DCTERMS document.  Surely 
 someone has done this?

 (I'm sure I've oversimplified or misunderstood something, but hopefully the 
 general approach is understandable)

 - for that matter, is there a good example of how to properly serialize 
 DCTERMS for eg. a converted MARC/MODS record in XML (or RDF/XML)?  I see, eg. 
 http://dublincore.org/documents/dcq-rdf-xml/ which has been replaced by 
 http://dublincore.org/documents/dc-rdf/ but I'm not sure if the latter 
 obviates the former entirely?  Also, the examples at the bottom of the latter 
 don't show, eg. repeated elements or DCMES elements.  Do we abandon 
 http://purl.org/dc/elements/1.1/ entirely?

 For example, is this valid?

 rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#;
  xmlns:dcterms=http://purl.org/dc/terms/;
  xmlns:dc=http://purl.org/dc/elements/1.1/ 

  rdf:Description rdf:about=http://example.org/123;
  dc:title xml:lang=enLearning Biology/dcterms:title
  dcterms:title xml:lang=enLearning Biology/dcterms:title
  dcterms:alternative xml:lang=enA primer on biological 
 processes/dcterms:title
  dcterms:creator xml:lang=enBar, Foo/dcterms:creator
  dcterms:creator xml:lang=enSmith, Jane/dcterms:creator
  dc:creator xml:lang=enBar, Foo/dc:creator
  dc:creator xml:lang=enSmith, Jane/dc:creator
  /rdf:Description

 /rdf:RDF

 Apologies for any questions that seem silly or naive — I think i have a 
 pretty firm grasp on the levels of abstraction involved, but for the life of 
 me, I can't find much solid stuff about DCTERMS outside of the DCMI website, 
 which can be a bit of a challenge to navigate at times.

 Thanks,
 MJ




Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread Ross Singer
On Tue, May 4, 2010 at 7:55 AM, Mike Taylor m...@indexdata.com wrote:
 Having read the rest of this thread, I find that nothing that's been
 said changes my initial gut reaction on reading this question: DO NOT
 USE DCTERMS.  It's vocabulary is Just Plain Inadequate, and not only
 for esoteric cases like the Alternative Chronological Designation of
 First Issue or Part of Sequence field that Karen mentioned.  Despite
 having 70 (seventy!) elements, it's lacking fundamental fields for
 describing articles in journals -- there are no journalTitle, volume,
 issue, startPage or endPage fields.  That, for me, is a deal-breaker.

If you're using Dublin Core as XML, I agree with this.  If you're
using Dublin Core as RDF (which is, honestly, the only thing it's
really good for), this is a non-issue.

-Ross.


Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread Mike Taylor
On 4 May 2010 13:19, Ross Singer rossfsin...@gmail.com wrote:
 On Tue, May 4, 2010 at 7:55 AM, Mike Taylor m...@indexdata.com wrote:
 Having read the rest of this thread, I find that nothing that's been
 said changes my initial gut reaction on reading this question: DO NOT
 USE DCTERMS.  It's vocabulary is Just Plain Inadequate, and not only
 for esoteric cases like the Alternative Chronological Designation of
 First Issue or Part of Sequence field that Karen mentioned.  Despite
 having 70 (seventy!) elements, it's lacking fundamental fields for
 describing articles in journals -- there are no journalTitle, volume,
 issue, startPage or endPage fields.  That, for me, is a deal-breaker.

 If you're using Dublin Core as XML, I agree with this.  If you're
 using Dublin Core as RDF (which is, honestly, the only thing it's
 really good for), this is a non-issue.

Oh, what is the solution when using it in RDF?


Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread Ed Summers
On Tue, May 4, 2010 at 8:24 AM, Mike Taylor m...@indexdata.com wrote:
 Oh, what is the solution when using it in RDF?

I've been using the Bibliographic Ontology myself:

  http://bibliontology.com/

Lots of stuff in there for journals, etc ... and reuse of other
vocabularies like event, foaf, prism and (ahem) dcterms.

//Ed


Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread MJ Suhonos
I'd just like to say a word of thanks for everyone who has contributed so far 
on this thread.  The viewpoints raised certainly help clarify at least my 
understanding of some of the issues and concepts involved.

 MARCXML is a step in the right direction. MODS goes even further. Neither 
 really go far enough.


And that succinctly, Eric manages to summarize my (and I strongly suspect, many 
others') sentiment on the issue at hand.  Of course, the natural follow-on 
question is go far enough for *what* exactly, and this is where my original 
question came from.

It sounds like once again we have the issue that our current tools (MODS, 
DCTERMS) aren't good enough, which means we either have to:

a) stop doing things while we build new, better tools like Karen's 
MARC-in-triples (which seems like a really interesting idea)
or
b) start building imperfect — perhaps highly flawed — things with our current, 
imperfect tools

I'm not nearly smart enough to do a) so my intent is to take a stab at b), or 
else sit back and consider a new line of work entirely (which happens 
distressingly often, usually after reading enough discouraging statements from 
librarians in a given day).

 I think there's a fundamental difference between MODS and DCTERMS that make 
 this nearly impossible. I've sometimes described this as the difference 
 between metadata as record format (MARC, oai_dc, MODS, etc) and metadata 
 as vocabulary (DCTERMS, DCAM,  RDF Vocabs in general).

This is a great clarification, and one of the main frustrations I have with 
MODS: it is bound nearly inseparably to XML as a format (and this is coming 
from someone who knows and loves XML dearly).  The idea of DCTERMS/DC/etc as a 
format-independent model seems like a step in the right direction, IMO.

 RDF's grammar comes from the RDF Data Model, and DC's comes from DCAM as well 
 as directly from RDF. The process that Karen Coyle describes is really the 
 only way forward in making a good faith effort to put MARC (the 
 bibliographic data) onto the Semantic Web.

Fair enough.  But I would contend that putting MARC / bib data on the Semantic 
Web is just one use case; even though I realize that to Semantic Web advocates 
that it's the *only* use case worth considering.

I find it difficult to imagine that building a record format from just a list 
of words is completely useless, especially given that right now there's next 
to *zero* access to bibliographic data from libraries.  Maybe the way to go is 
to just make the MARCXML available via OAI-PMH and OpenSearch and leave it at 
that.

 A more rational approach, IMO, would create a general description set 
 (probably numbering 20-50), then expanding that for more detail and for 
 different materials. Users of the sets could define the zones they wish to 
 use in an application profile, so no one would have to carry around data 
 elements that they are sure they will not use. It would also provide a simple 
 but compatible set for folks who don't want to do the whole library 
 description bit.

I agree with this 100%, and conceptually that's what DC and DCTERMS seemed to 
be the basis of, at least to me.  This seems to parallel the MARC approach to 
refinement, which can be expressed as either a hierarchy or a set of 
independent assertions.  Moreover, it's format-independent, so it could be 
serialized as XML, or RDF, or JSON for that matter.  Is this what the RDA 
entities are supposed to achieve?

Let me give another example: the Open Library API returns a JSON tree, eg. 
http://openlibrary.org/books/OL1M.json

But what schema is this?  And if it doesn't conform to a standard schema, does 
that make it useless? If it were based on DCTERMS, at least I'd have a 
reference at http://dublincore.org/documents/dcmi-terms/ to define the 
semantics being used (and an RDF namespace at http://purl.org/dc/terms/ to 
boot).

MJ


Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread Aaron Rubinstein

On 5/4/2010 9:54 AM, Karen Coyle wrote:


BIBO, which many people seem to like, has almost 200 data
elements and classes, and is greatly lacking in some areas (e.g. maps,
music).


What makes BIBO useful, in my limited experience, is that it integrates 
commonly used ontologies like FOAF and DCTERMS.  Also, since it is an 
ontology for RDF description, you can supplement other vocabularies for 
specific cases that BIBO doesn't handle.  As Ross Singer just posted as 
I'm writing this:


On 5/4/2010 9:57 AM, Ross Singer wrote:
 In RDF, you can pull in predicates from other namespaces, where the
 attributes you're looking for may be defined. What's nice about this
 is that works sort of like how namespaces are *supposed* to work in
 XML:  that is, an agent that comes along and grabs your triples will
 parse the assertions from vocabularies it understands and ignore those
 it doesn't.

It's important that we don't look at BIBO or any other bibliographic 
ontology as an uber vocabulary.  One of the many elegant features of 
RDF, IMHO, is that each specialization can contribute their own 
vocabulary, e.g. general vocabularies like FOAF, DCTERMS, and BIBO can 
be refined by more domain specific vocabularies like the music 
ontology[1], or ontologies for describing archival collections, sheet 
music, maps...  In fact, having only 200 properties and classes gives 
BIBO an advantage:  it's easy to grok and plays nicely with other 
vocabularies, which could do the heavy lifting for specific resources.


I feel like it makes the most sense to let domain specialists create 
domain specific vocabularies rather than try to cover every conceivable 
situation in one vocabulary written by a centralized body.


One last thought...  BIBO in particular is developed by a community. 
There is an active listserv[2] and the project leads are very receptive 
to comment.  If there is something important missing, let's help them.


[1] http://musicontology.com/
[2] http://bibliontology.com/community

Aaron


Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread Ross Singer
On Tue, May 4, 2010 at 10:26 AM, Mike Taylor m...@indexdata.com wrote:
 Ross, I think that got mangled in the sending -- either that, or it's
 some strange format that I've never seen before.  That said, I am
 tremendously impressed by all the information you obtained there.
 What software did you use, how much of this did you have to feed it by
 hand, and how much did it intuit from existing structured datasets?

Oh, that's probably not mangled, that's probably just how Turtle looks
:)  I'll also send it as RDF/XML.

That graph was compiled by a Google Scholar search on Mike Taylor
dinosaur, the Ingenta page describing your article, a text editor
(TextMate) and 30 minutes of my life I'll never get back.

Ok, here's the graph as RDF/XML:

?xml version=1.0 encoding=utf-8?
rdf:RDF
   xmlns:bibo=http://purl.org/ontology/bibo/;
   xmlns:dcterms=http://purl.org/dc/terms/;
   xmlns:foaf=http://xmlns.com/foaf/0.1/;
   xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#;
   xmlns:xsd=http://www.w3.org/2001/XMLSchema#integer;
  bibo:AcademicArticle rdf:nodeID=article1
dcterms:abstract xml:lang=enXenoposeidon proneneukos gen. et
sp. nov. is a neosauropod represented by BMNH R2095, a well-preserved
partial mid-to-posterior dorsal vertebra from the
Berriasian-Valanginian Hastings Beds Group of Ecclesbourne Glen, East
Sussex, England. It was briefly described by Lydekker in 1893, but it
has subsequently been overlooked. This specimen's concave cotyle,
large lateral pneumatic fossae, complex system of bony laminae and
camerate internal structure show that it represents a neosauropod
dinosaur. However, it differs from all other sauropods in the form of
its neural arch, which is taller than the centrum, covers the entire
dorsal surface of the centrum, has its posterior margin continuous
with that of the cotyle, and slopes forward at 35 degrees relative to
the vertical. Also unique is a broad, flat area of featureless bone on
the lateral face of the arch; the accessory infraparapophyseal and
postzygapophyseal laminae which meet in a V; and the asymmetric neural
canal, small and round posteriorly but large and teardrop-shaped
anteriorly, bounded by arched supporting laminae. The specimen cannot
be referred to any known sauropod genus, and clearly represents a new
genus and possibly a new `family'. Other sauropod remains from the
Hastings Beds Group represent basal Titanosauriformes, Titanosauria
and Diplodocidae; X. proneneukos may bring to four the number of
sauropod `families' represented in this unit. Sauropods may in general
have been much less morphologically conservative than is usually
assumed. Since neurocentral fusion is complete in R2095, it is
probably from a mature or nearly mature animal. Nevertheless, size
comparisons of R2095 with corresponding vertebrae in the Brachiosaurus
brancai holotype HMN SII and Diplodocus carnegii holotype CM 84
suggest a rather small sauropod: perhaps 15 m long and 7600 kg in mass
if built like a brachiosaurid, or 20 m and 2800 kg if built like a
diplodocid./dcterms:abstract
dcterms:creator rdf:resource=_:author1/
dcterms:creator rdf:resource=_:author2/
dcterms:isPartOf rdf:resource=_:journal1/
dcterms:issued
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerdate;2007-11/dcterms:issued
dcterms:language
rdf:resource=http://purl.org/NET/marccodes/languages/eng#lang/
dcterms:subject
rdf:resource=http://id.loc.gov/authorities/sh85038094#concept/
dcterms:subject
rdf:resource=http://id.loc.gov/authorities/sh85097127#concept/
dcterms:subject
rdf:resource=http://id.loc.gov/authorities/sh85117730#concept/
dcterms:title xml:lang=enAN UNUSUAL NEW NEOSAUROPOD DINOSAUR
FROM THE LOWER CRETACEOUS HASTINGS BEDS GROUP OF EAST SUSSEX,
ENGLAND/dcterms:title
bibo:authorList
  rdf:Description
rdf:first rdf:resource=_:author1/
rdf:rest
  rdf:Description
rdf:first rdf:resource=_:author2/
rdf:rest
rdf:resource=http://www.w3.org/1999/02/22-rdf-syntax-ns#nil/
  /rdf:Description
/rdf:rest
  /rdf:Description
/bibo:authorList
bibo:doi10./j.1475-4983.2007.00728.x/bibo:doi
bibo:issue
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;6/bibo:issue
bibo:numPages
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;18/bibo:numPages
bibo:pageEnd
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;1564/bibo:pageEnd
bibo:pageStart
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;1547/bibo:pageStart
bibo:pages1547-1564/bibo:pages
bibo:volume
rdf:datatype=http://www.w3.org/2001/XMLSchema#integerinteger;50/bibo:volume
  /bibo:AcademicArticle
  bibo:Journal rdf:nodeID=journal1
dcterms:publisher rdf:resource=_:publisher1/
dcterms:titlePalaeontology/dcterms:title
bibo:issn0031-0239/bibo:issn
foaf:homepage
rdf:resource=http://www3.interscience.wiley.com/journal/118531917/home?CRETRY=1amp;SRETRY=0/
  /bibo:Journal
  

Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread MJ Suhonos
 Let me give another example: the Open Library API returns a JSON  tree, eg. 
 http://openlibrary.org/books/OL1M.json
 
 But what schema is this?  And if it doesn't conform to a standard  schema, 
 does that make it useless? If it were based on DCTERMS, at  least I'd have a 
 reference at  http://dublincore.org/documents/dcmi-terms/ to define the 
 semantics  being used (and an RDF namespace at http://purl.org/dc/terms/ to  
 boot).
 
 Ah, after my own heart! I have tried to convince the OL folks to translate 
 their data to dcterms, even did a crosswalk for them. Right now they're in 
 panic mode over a major milestone, but once that's over I may ping you to 
 make this request directly to them on one of their lists. If they only hear 
 it from me, it might just be a personal quirk of mine, right?

See, we're on the same page after all.  :-)

Considering one of my primary use cases is direct interoperation with Open 
Library then yes, I'm all over it.  I'll at least harass Edward and the OL list 
that DC output is important to others beyond just you alone.

I was starting to get discouraged, but now I realize that many of you thought I 
was proposing DCTERMS as a replacement for MARC; not at all.

Imagine Open Library's internal data schema being an easily-serializable model 
based on DCTERMS.  Now imagine every library has a queryable API exactly like 
theirs.  That's where I'm going, and I think (answering my own question above) 
that it *is* potentially useful.

 p.s. The JSON API output doesn't require any programming when it uses their 
 data elements; it does do crosswalk to dcterms that's been the hangup. 
 Then again... their code is open source, the crosswalk I did is linked from 
 the launchpad entry here [1] so if anyone wants to contribute…

Unfortunately I'm not adept at Python, so writing the code by hand is probably 
a bit beyond me at this point.  But it might make a fun 
learn-Python-in-a-rainy-weekend project.

MJ


Re: [CODE4LIB] MODS and DCTERMS

2010-05-04 Thread MJ Suhonos
No apologies required — your dissection of the (very important) differences 
between MODS and DCTERMS, both in concept and format, was extremely 
enlightening and helpful; as was all the other input.

Any misunderstandings are much more my fault for not being clearer when Ross 
asked what my use case was.  I also made the mistake of referencing RDF, which 
I (now better) understand incorporates a whole universe of world-views that 
unnecessarily complicated things.

Much learned, and as always, much obliged.

MJ

On 2010-05-04, at 3:48 PM, Corey Harper wrote:

 Thank you for this clarification, MJ. I apologize for my initial reaction 
 that there was little value here. Knowing the use-case you define below, I 
 think there's a great deal of value.
 
 Beyond just the pragmatic short-term gains, I think a development like this 
 would help pin-point those areas where said schema functionally requires 
 semantics beyond those in the DCTERMS. All the better if some of those terms 
 just happen to be available in Bibliontology or some other namespace...
 
 Thanks again,
 -Corey
 
 MJ Suhonos wrote:
 Let me give another example: the Open Library API returns a JSON  tree, 
 eg. http://openlibrary.org/books/OL1M.json
 
 But what schema is this?  And if it doesn't conform to a standard  schema, 
 does that make it useless? If it were based on DCTERMS, at  least I'd have 
 a reference at  http://dublincore.org/documents/dcmi-terms/ to define the 
 semantics  being used (and an RDF namespace at http://purl.org/dc/terms/ 
 to  boot).
 Ah, after my own heart! I have tried to convince the OL folks to translate 
 their data to dcterms, even did a crosswalk for them. Right now they're in 
 panic mode over a major milestone, but once that's over I may ping you to 
 make this request directly to them on one of their lists. If they only hear 
 it from me, it might just be a personal quirk of mine, right?
 See, we're on the same page after all.  :-)
 Considering one of my primary use cases is direct interoperation with Open 
 Library then yes, I'm all over it.  I'll at least harass Edward and the OL 
 list that DC output is important to others beyond just you alone.
 I was starting to get discouraged, but now I realize that many of you 
 thought I was proposing DCTERMS as a replacement for MARC; not at all.
 Imagine Open Library's internal data schema being an easily-serializable 
 model based on DCTERMS.  Now imagine every library has a queryable API 
 exactly like theirs.  That's where I'm going, and I think (answering my own 
 question above) that it *is* potentially useful.
 p.s. The JSON API output doesn't require any programming when it uses their 
 data elements; it does do crosswalk to dcterms that's been the hangup. 
 Then again... their code is open source, the crosswalk I did is linked from 
 the launchpad entry here [1] so if anyone wants to contribute�
 Unfortunately I'm not adept at Python, so writing the code by hand is 
 probably a bit beyond me at this point.  But it might make a fun 
 learn-Python-in-a-rainy-weekend project.
 MJ
 
 -- 
 Corey A Harper
 Metadata Services Librarian
 New York University Libraries
 20 Cooper Square, 3rd Floor
 New York, NY 10003-7112
 212.998.2479
 corey.har...@nyu.edu


Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread Riley, Jenn
Hi MJ,

 - for that matter, is there a good example of how to properly
 serialize DCTERMS for eg. a converted MARC/MODS record in XML (or
 RDF/XML)?  I see, eg. http://dublincore.org/documents/dcq-rdf-xml/
 which has been replaced by http://dublincore.org/documents/dc-rdf/
 but I'm not sure if the latter obviates the former entirely?  Also, the
 examples at the bottom of the latter don't show, eg. repeated elements
 or DCMES elements.  Do we abandon http://purl.org/dc/elements/1.1/
 entirely?

This has always been ridiculously confusing! Here's my understanding (though 
anyone else, please chime in and correct me if I've misunderstood):

- With the maturation of the DCMI Abstract Model 
http://dublincore.org/documents/abstract-model/, new bindings were needed to 
express features of the model not obvious in the old RDF, XML, and XHTML 
bindings.

- For RDF, http://dublincore.org/documents/dc-rdf/ is stable and fully 
intended to replace http://dublincore.org/documents/dcq-rdf-xml/.

- For XML (the non-RDF sort), the most current document is 
http://dublincore.org/documents/dc-ds-xml/, though note its status is still 
(after 18 months) only a proposed recommendation. This document itself replaces 
a transition document http://dublincore.org/documents/2006/05/29/dc-xml/ from 
2006 that never got beyond Working Draft status. To get a stable XML binding, 
you have to go all the way back to 2003 
http://dublincore.org/documents/dc-xml-guidelines/index.shtml, a binding 
which predates much of the current DCMI Abstract Model.

- Many found the 2003 XML binding unsatisfactory in that it prescribed the 
format for individual dc and dcterms properties, but not a full XML format - 
that is, there was no DC-sanctioned XML root element for a qualified DC 
record. (This gets at the very heart of the difference in perspective between 
RDF and XML, properties and elements, etc., I think, but I digress...) The 
folks I'm aware of that developed workarounds for this were those sharing QDC 
over OAI-PMH. I find the UIUC OAI registry 
http://oai.grainger.uiuc.edu/registry/ helpful for investigations of this 
sort. A quick glance at their report on Distinct Metadata Schemas used in 
OAI-PMH data providers http://oai.grainger.uiuc.edu/registry/ListSchemas.asp 
seems to suggest that CONTENTdm uses this schema for QDC 
http://epubs.cclrc.ac.uk/xsd/qdc.xsd and DSpace uses this one 
http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd. The latter 
doesn't actually define a root element either, but since here a!
 t least the QDC is inside the wrappers the OAI-PMH response requires it's 
well-formed. What someone does with that once they get it and unpack it, I 
don't know, since without a container it won't be well-formed XML. The former 
goes through several levels of importing other things and eventually ends up 
importing from an .xsd on the Dublin Core site, but they define a root element 
themselves along the way. (I think.)

- So what does one do? I guess it depends on who your target consumers of this 
data are. If you're looking to work with more traditional library environments, 
perhaps those that are using CONTENTdm, etc. the legacy hack-ish format might 
be the best. (I'm part of an initiative to revitalize the Sheet Music 
Consortium http://digital.library.ucla.edu/sheetmusic/ and lots of our 
potential contributors are CONTENTdm users, so I think this is the direction 
I'm going to take that project.) But if you're wanting to talk to DCMI-style 
folks, the dc-ds-xml, or more likely the dc-rdf option seems more attractive. 
I'm afraid I'm not much help with the implementation details of dc-rdf, though. 
One of the DC mailing list would be, though, I suspect. There are a lot of 
active members there.

Ick, huh? :-)

Jenn


Jenn Riley
Metadata Librarian
Digital Library Program
Indiana University - Bloomington
Wells Library W501
(812) 856-5759
www.dlib.indiana.edu

Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com


Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread Jonathan Rochkind
I'm still confused about all this stuff too, but I've often see the 
oai_dc format (for OAI/PMH I think?) used as a 'standard' way to expose 
simple DC attributes.


One thing I was confused about was whether the oai_dc format _required_ 
the use of the old style DC uri's, or also allowed the use of the 
DCterms URIs?   Anyone know?  I kind of think it actually requires the 
old-style DC uri's, as it was written before dcterms. 

At least it is one standardized way to expose the old basic DC elements, 
with a specific XML schema.


Jonathan

Riley, Jenn wrote:

Hi MJ,

  

- for that matter, is there a good example of how to properly
serialize DCTERMS for eg. a converted MARC/MODS record in XML (or
RDF/XML)?  I see, eg. http://dublincore.org/documents/dcq-rdf-xml/
which has been replaced by http://dublincore.org/documents/dc-rdf/
but I'm not sure if the latter obviates the former entirely?  Also, the
examples at the bottom of the latter don't show, eg. repeated elements
or DCMES elements.  Do we abandon http://purl.org/dc/elements/1.1/
entirely?



This has always been ridiculously confusing! Here's my understanding (though 
anyone else, please chime in and correct me if I've misunderstood):

- With the maturation of the DCMI Abstract Model 
http://dublincore.org/documents/abstract-model/, new bindings were needed to 
express features of the model not obvious in the old RDF, XML, and XHTML bindings.

- For RDF, http://dublincore.org/documents/dc-rdf/ is stable and fully intended to 
replace http://dublincore.org/documents/dcq-rdf-xml/.

- For XML (the non-RDF sort), the most current document is 
http://dublincore.org/documents/dc-ds-xml/, though note its status is still (after 18 
months) only a proposed recommendation. This document itself replaces a transition document 
http://dublincore.org/documents/2006/05/29/dc-xml/ from 2006 that never got beyond 
Working Draft status. To get a stable XML binding, you have to go all the way back to 2003 
http://dublincore.org/documents/dc-xml-guidelines/index.shtml, a binding which predates 
much of the current DCMI Abstract Model.

- Many found the 2003 XML binding unsatisfactory in that it prescribed the format for individual dc and dcterms 
properties, but not a full XML format - that is, there was no DC-sanctioned XML root element for a qualified DC 
record. (This gets at the very heart of the difference in perspective between RDF and XML, properties 
and elements, etc., I think, but I digress...) The folks I'm aware of that developed workarounds for this were 
those sharing QDC over OAI-PMH. I find the UIUC OAI registry http://oai.grainger.uiuc.edu/registry/ 
helpful for investigations of this sort. A quick glance at their report on Distinct Metadata Schemas used in 
OAI-PMH data providers http://oai.grainger.uiuc.edu/registry/ListSchemas.asp seems to suggest that 
CONTENTdm uses this schema for QDC http://epubs.cclrc.ac.uk/xsd/qdc.xsd and DSpace uses this one 
http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd. The latter doesn't actually define a root 
element either, but since here!

 a!

 t least the QDC is inside the wrappers the OAI-PMH response requires it's 
well-formed. What someone does with that once they get it and unpack it, I 
don't know, since without a container it won't be well-formed XML. The former 
goes through several levels of importing other things and eventually ends up 
importing from an .xsd on the Dublin Core site, but they define a root element 
themselves along the way. (I think.)

- So what does one do? I guess it depends on who your target consumers of this data 
are. If you're looking to work with more traditional library environments, perhaps 
those that are using CONTENTdm, etc. the legacy hack-ish format might be the best. 
(I'm part of an initiative to revitalize the Sheet Music Consortium 
http://digital.library.ucla.edu/sheetmusic/ and lots of our potential 
contributors are CONTENTdm users, so I think this is the direction I'm going to take 
that project.) But if you're wanting to talk to DCMI-style folks, the dc-ds-xml, or 
more likely the dc-rdf option seems more attractive. I'm afraid I'm not much help 
with the implementation details of dc-rdf, though. One of the DC mailing list would 
be, though, I suspect. There are a lot of active members there.

Ick, huh? :-)

Jenn


Jenn Riley
Metadata Librarian
Digital Library Program
Indiana University - Bloomington
Wells Library W501
(812) 856-5759
www.dlib.indiana.edu

Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com

  


Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread Ross Singer
Out of curiosity, what is your use case for turning this into DC?
That might help those of us that are struggling to figure out where to
start with trying to help you with an answer.

-Ross.

On Mon, May 3, 2010 at 11:46 AM, MJ Suhonos m...@suhonos.ca wrote:
 Thanks for your comments, guys.  I was beginning to think the lack of 
 response indicated that I'd asked something either heretical or painfully 
 obvious.  :-)

 That's my understanding as well. oai_dc predates the defining of the 15 
 legacy DC properties in the dcterms namespace, and it's my guess nobody saw 
 a reason to update the oai_dc definition after this happened.

 This is at least part of my use case — we do a lot of work with OAI on both 
 ends, and oai_dc is pretty limited due to the original 15 elements.  My 
 thinking at this point is that there's no reason we couldn't define something 
 like oai_dcterms and use the full QDC set based on the updated profile.  
 Right?

 FWIW, I'm not limited to any legacy ties; in fact, my project is aimed at 
 pushing the newer, DC-sanctioned ideas forward, so I suspect in my case 
 using an XML serialization that validates against http://purl.org/dc/terms/ 
 is probably sufficient (whether that's RDF or not doesn't matter at this 
 point).

 So, back to the other part of the question:  has anybody seen a MODS — 
 DCTERMS crosswalk in the wild?  It looks like there's a lot of similarity 
 between the two, but before I go too deep down that rabbit hole, I'd like to 
 make sure someone else hasn't already experienced that, erm, joy.

 MJ



Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread MJ Suhonos
 dcterms so so terribly lossy that it would be a shame to reduce MARC to it.

This is *precisely* the other half of my rationale — a shame?  Why?  If MARC is 
the mind prison that some purport it to be, then let's see what a system built 
devoid of MARC, but based on the best alternative we have looks like.

That may well *not* be DCTERMS, but I do like the DCAM model, and there are 
plenty of non-library systems out there that speak simple DC (OAI-PMH is one 
example from this thread alone).  Being conceptually RDF-compatible is just a 
bonus for me.

This would be an incentive for them to at least consider implementing DCTERMS, 
which may be terribly lossy compared to MARC, but is a huge increase in 
expressivity compared to simple DC.  Integrating MARC-based records and 
DC-based records from OAI sources in a single database could be a useful thing 
to play with.

 What we need, ASAP, is a triple form of MARC (and I know some folks have 
 experimented with this...) and a translate from MARC to the RDA elements that 
 have been registered in RDF. However, I hear that JSC is going to be adding 
 more detail to the RDA elements so that could mean changes coming down the 
 pike.  I am interested in working on MARC as triples, which I see as a 
 transformation format. I have a database of MARC elements that might be a 
 crude basis for this.

This seems like it's looking to accomplish different goals than I am, but 
obviously if there's a MARC-as-triples intermediary that's workable *today* 
then I'd be happy to use that instead.  But I wonder: how navigable is it by 
people who don't understand MARC?  How much loss is potentially involved?

 QDC basically represents the same things has dcterms, so you can
 probably just take the existing XSLT and hack on it until it until it
 represents something that looks more like dcterms than qdc.

Yeah, that might be easier than mapping from MODS, though I'll have to see how 
much I can look at a MARC-based XSLT before my brain melts.  Hopefully it 
wouldn't take *too* much work.

 That won't address of the issue of breaking up the MARC into
 individual resources, however.  You mention that you are looking for
 the short hop to RDF, but this is just going to give you a big pile of
 literals for things like creator/contributor/subject, etc.  I'm not
 really sure what the win would be, there.

Well, a MARC-as-triples approach would suffer from the same problem just as 
much, at least initially.  I think the issue of converting literals into URIs 
is an important second step, but let's get the literals into a workable format 
first.

I should clarify that my ultimate goal isn't to find a magical easy way to RDF, 
but rather to try to realize a way for libraries to get their data into a 
format that others are able and willing to play with.  I'm betting on the 
notion that the majority of (presumably non-librarian) users would rather have 
incomplete data in a format that they can understand and manipulate, rather 
than have to learn MARC.  I certainly would, and I'm a librarian (though 
probably a poor one because I don't understand or highly value MARC).

Naive? Heretical? Probably.  But worth a shot, I think.

MJ


Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread MJ Suhonos
 NB: When Karen Coyle, Eric Morgan, and Roy Tennant all reply to your thread 
 within half an hour of each other, you know you've hit the big time.  Time to 
 retire young I think.

That would be Eric *Lease* Morgan — oh my god, you're right!  I'm already 
losing data!  It *is* insidious!  I repent!

MJ


Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread Aaron Rubinstein

On 5/3/2010 1:55 PM, Karen Coyle wrote:


1. MARC the data format -- too rigid, needs to go away
2. MARC21 bib data -- very detailed, well over 1,000 different data
elements, some well-coded data (not all); unfortunately trapped in #1


For the sake of my own understanding, I would love an explanation of the 
distinction between #1 and #2...  Re: #2, how is bibliographic data 
encoded in MARC any different than bibliographic data encoded in some 
other format?  Without the encoding format, you just have a pile of 
strings, right?  I agree that we have lots of rich bibliographic data 
encoded in MARC and it is an exciting possibility to move it out of MARC 
into other, more flexible formats.  Why, then, do we need to migrate the 
'elements' of the encoding format as well?  Taking one look at MARCXML 
makes it clear that the structure of MARC is not well suited to 
contemporary, *interoperable*, data formats.


Is there something specific to MARC that is not potentially covered by 
MODS/DCTERMS/BIBO/??? that I'm missing?


Thanks,

Aaron


Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread Bill Dueber
On Mon, May 3, 2010 at 2:40 PM, MJ Suhonos m...@suhonos.ca wrote:

 Yes, even to me as a librarian but not a cataloguer, many (most?) of these
 elements seem like overkill.  I have no doubt there is an edge-case for
 having this fine level of descriptive detail, but I wonder:

 a) what proportion of records have this level of description
 b) what kind of (or how much) user access justifies the effort in creating
 and preserving it


On many levels, I agree. Or I wish I could.

If you look at a business model like Amazon, for example, it's easy to
imagine that their overriding goal is, Make the easy-to-find stuff
ridiculously easy to find. The revenue they get from someone finding an
edge-case book is exactly the same as the revenue they get from someone
buying Harry Potter. The ROI easy to think about.

But I work in an academic library. In a lot of ways, our *primary audience*
is some grad student 12 years from now who needs one trivial piece of crap
to make it all come together in her head. I know we have thousands of books
that have never been looked at, but computing the ROI on someone being able
to see them some day is difficult. Maybe it's zero. Maybe not. We just can't
tell.

Now, none of this is to say that MARC/AACR2 is necessarily the best (or even
a good) way to go about making these works findable. I'm just saying that
evaluating the edge cases in terms of user access are a complicated
business.

  -Bill-

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread Beacom, Matthew
Although I agree with Roy's suggestion that librarians not gloat about our 
metadata, the notion that the value of a data element can be elicited from the 
frequency of its use in the overall domain of library materials is misleading 
and contrary to the report Roy cites. 

The sub-section of the very useful and informative OCLC report that Roy cites 
is very good on this point. Section 2. MARC Tag Usage in WorldCat by Karen 
Smith-Yoshimura clearly lays out the data in the context of WorldCat and the 
cataloging practice of the OCLC members.  

Library holdings are dominated by texts and in terms of titles cataloged texts 
are dominated by books. This preponderance of books tilts the ratios of use per 
individual data elements. Many data elements pertain to either a specific form 
of material, manuscripts, for instance. Others pertain to specific content, 
musical notation, for instance. Some pertain to both, manuscript scores, for 
instance. Within the total aggregate of library materials, data elements that 
are specific per material or content do not rise in usage rates to anything 
near 20% of the aggregate total of titles. Yet these elements are necessary or 
valuable to those wishing to discover and use the materials, and when one 
recalls that 1% use rates in WorldCat equal about 1,000,000 titles the 
usefulness of many MARC data elements can be seen as widespread.

According to the report, 69 MARC tags occur in more than 1% of the records in 
WorldCat.  That is quite a few more than the Roy's 11, but even accounting for 
Karen's data elements being equivalent to the number of MARC sub-fields this is 
far fewer than the 1,000 data elements available to a cataloger in MARC. 

Matthew Beacom


By the way, the descriptive fields used in more than 20% of the MARC records in 
WorldCat are:

245 Title statement 100%
260 Imprint statement 96%
300 Physical description 91%
100 Main entry - personal name 61%
650 Subject added entry - topical term 46%
500 General note 44%
700 Added entry - personal name 28%

They answer, more or less, a few basic questions a user might have about the 
material:
What is it called? Who made it? When was it made? How big is it? What is it 
about? Answers to the question, How can I get it? are usually given in the 
associated MARC holdings record. 
 

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Roy 
Tennant
Sent: Monday, May 03, 2010 2:15 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MODS and DCTERMS

I would even argue with the statement very detailed, well over 1,000
different data elements, some well-coded data (not all). There are only 11
(yes, eleven) MARC fields that appear in 20% or more of MARC records
currently in WorldCat[1], and at least three of those elements are control
numbers or other elements that contribute nothing to actual description. I
would say overall that we would do well to not gloat about our metadata
until we've reviewed the facts on the ground. Luckily, now we can.
Roy

[1] http://www.oclc.org/research/publications/library/2010/2010-06.pdf

On Mon, May 3, 2010 at 11:03 AM, Eric Lease Morgan emor...@nd.edu wrote:

 On May 3, 2010, at 1:55 PM, Karen Coyle wrote:

  1. MARC the data format -- too rigid, needs to go away
  2. MARC21 bib data -- very detailed, well over 1,000 different data
  elements, some well-coded data (not all); unfortunately trapped in #1



 The differences between the two points enumerated above, IMHO, seem to be
 the at the heart of the never-ending debate between computer types and
 cataloger types when it comes to library metadata. The non-library computer
 types don't appreciate the value of human-aided systematic description. And
 the cataloger types don't understand why MARC is a really terrible bit
 bucket, especially considering the current environment. All too often the
 two camps don't know to what the other is speaking. MARC must die. Long
 live MARC.

 --
 Eric Lease Morgan



Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread Karen Coyle

Quoting Beacom, Matthew matthew.bea...@yale.edu:



According to the report, 69 MARC tags occur in more than 1% of the   
records in WorldCat.  That is quite a few more than the Roy's 11,   
but even accounting for Karen's data elements being equivalent to   
the number of MARC sub-fields this is far fewer than the 1,000 data   
elements available to a cataloger in MARC.


So much depends on how you count things, so at the  
http://kcoyle.net/rda/ site I have put two MARC-related files. The  
first is just a list of elements (variable subfields) in alpha order  
with duplicates removed. Yes, I realize how imperfect this is, and  
that we will need to look beyond names to *meaning* of elements to  
determine what we really have. This file does not include indicators,  
and sometimes indicators really do create a separate element, like  
when person name becomes Family based on its indicator.


That file has over 560 entries.

The next file probably needs some more thought, but it is a list of  
the variable field indicators and subfields, leaving in subfields that  
are duplicated in different fields. I removed some of the numeric  
subfields that didn't seem to result in an actual elements (2, 3, 5,  
6, 8), but could be wrong about that. I also did not include  
indicators that are = Undefined. We can debate whether a personal  
name in an added entry is the same element as a personal name in a  
subject heading, and similarly for the various places where geographic  
names are used, titles, etc etc etc. This is the analysis that is  
needed to reduce MARC21 to a cleaner set of data elements.


That file has 1421 entries.

Neither of these contains any of the fixed field elements (many of  
which, IMO, should replace textual elements now carried in MARC21).  
When I looked at the fixed fields (and this is reported at  
http://futurelib.pbworks.com/Data+and+Studies), I came up with this  
count of *unique* fixed field elements (each with multiple values):


008 - 58
007 - 55

Each one of these should become a controlled value list in a SemWeb  
implementation of MARC. RDA appears to have a total of 68 defined  
value lists, but I don't believe that those include ones defined  
elsewhere, such as languages, country codes, etc.


kc

p.s. linked from that same page is the file I am using for this  
analysis, in CSV format, if anyone else wants to play with it. I have  
tried to keep it up to date with MARBI proposals.




Matthew Beacom


By the way, the descriptive fields used in more than 20% of the MARC  
 records in WorldCat are:


245 Title statement 100%
260 Imprint statement 96%
300 Physical description 91%
100 Main entry - personal name 61%
650 Subject added entry - topical term 46%
500 General note 44%
700 Added entry - personal name 28%

They answer, more or less, a few basic questions a user might have   
about the material:
What is it called? Who made it? When was it made? How big is it?   
What is it about? Answers to the question, How can I get it? are   
usually given in the associated MARC holdings record.



-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf  
 Of Roy Tennant

Sent: Monday, May 03, 2010 2:15 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MODS and DCTERMS

I would even argue with the statement very detailed, well over 1,000
different data elements, some well-coded data (not all). There are only 11
(yes, eleven) MARC fields that appear in 20% or more of MARC records
currently in WorldCat[1], and at least three of those elements are control
numbers or other elements that contribute nothing to actual description. I
would say overall that we would do well to not gloat about our metadata
until we've reviewed the facts on the ground. Luckily, now we can.
Roy

[1] http://www.oclc.org/research/publications/library/2010/2010-06.pdf

On Mon, May 3, 2010 at 11:03 AM, Eric Lease Morgan emor...@nd.edu wrote:


On May 3, 2010, at 1:55 PM, Karen Coyle wrote:

 1. MARC the data format -- too rigid, needs to go away
 2. MARC21 bib data -- very detailed, well over 1,000 different data
 elements, some well-coded data (not all); unfortunately trapped in #1



The differences between the two points enumerated above, IMHO, seem to be
the at the heart of the never-ending debate between computer types and
cataloger types when it comes to library metadata. The non-library computer
types don't appreciate the value of human-aided systematic description. And
the cataloger types don't understand why MARC is a really terrible bit
bucket, especially considering the current environment. All too often the
two camps don't know to what the other is speaking. MARC must die. Long
live MARC.

--
Eric Lease Morgan







--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234  
begin_of_the_skype_highlighting  1-510-435-8234  end_of_the_skype_highlighting

skype: kcoylenet


Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread Roy Tennant
Thanks, Matthew, for a much more nuanced and accurate depiction of the data.
I would encourage anyone interested in this topic to spend some time with
this report, which was one result of a great deal of work by many people in
research institutions around the world. The findings and recommendations are
well worth your time.
Roy

On Mon, May 3, 2010 at 11:55 AM, Beacom, Matthew matthew.bea...@yale.eduwrote:

 Although I agree with Roy's suggestion that librarians not gloat about our
 metadata, the notion that the value of a data element can be elicited from
 the frequency of its use in the overall domain of library materials is
 misleading and contrary to the report Roy cites.

 The sub-section of the very useful and informative OCLC report that Roy
 cites is very good on this point. Section 2. MARC Tag Usage in WorldCat by
 Karen Smith-Yoshimura clearly lays out the data in the context of WorldCat
 and the cataloging practice of the OCLC members.

 Library holdings are dominated by texts and in terms of titles cataloged
 texts are dominated by books. This preponderance of books tilts the ratios
 of use per individual data elements. Many data elements pertain to either a
 specific form of material, manuscripts, for instance. Others pertain to
 specific content, musical notation, for instance. Some pertain to both,
 manuscript scores, for instance. Within the total aggregate of library
 materials, data elements that are specific per material or content do not
 rise in usage rates to anything near 20% of the aggregate total of titles.
 Yet these elements are necessary or valuable to those wishing to discover
 and use the materials, and when one recalls that 1% use rates in WorldCat
 equal about 1,000,000 titles the usefulness of many MARC data elements can
 be seen as widespread.

 According to the report, 69 MARC tags occur in more than 1% of the records
 in WorldCat.  That is quite a few more than the Roy's 11, but even
 accounting for Karen's data elements being equivalent to the number of MARC
 sub-fields this is far fewer than the 1,000 data elements available to a
 cataloger in MARC.

 Matthew Beacom


 By the way, the descriptive fields used in more than 20% of the MARC
 records in WorldCat are:

 245 Title statement 100%
 260 Imprint statement 96%
 300 Physical description 91%
 100 Main entry - personal name 61%
 650 Subject added entry - topical term 46%
 500 General note 44%
 700 Added entry - personal name 28%

 They answer, more or less, a few basic questions a user might have about
 the material:
 What is it called? Who made it? When was it made? How big is it? What is it
 about? Answers to the question, How can I get it? are usually given in the
 associated MARC holdings record.


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Roy Tennant
 Sent: Monday, May 03, 2010 2:15 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] MODS and DCTERMS

 I would even argue with the statement very detailed, well over 1,000
 different data elements, some well-coded data (not all). There are only 11
 (yes, eleven) MARC fields that appear in 20% or more of MARC records
 currently in WorldCat[1], and at least three of those elements are control
 numbers or other elements that contribute nothing to actual description. I
 would say overall that we would do well to not gloat about our metadata
 until we've reviewed the facts on the ground. Luckily, now we can.
 Roy

 [1] http://www.oclc.org/research/publications/library/2010/2010-06.pdf

 On Mon, May 3, 2010 at 11:03 AM, Eric Lease Morgan emor...@nd.edu wrote:

  On May 3, 2010, at 1:55 PM, Karen Coyle wrote:
 
   1. MARC the data format -- too rigid, needs to go away
   2. MARC21 bib data -- very detailed, well over 1,000 different data
   elements, some well-coded data (not all); unfortunately trapped in #1
 
 
 
  The differences between the two points enumerated above, IMHO, seem to be
  the at the heart of the never-ending debate between computer types and
  cataloger types when it comes to library metadata. The non-library
 computer
  types don't appreciate the value of human-aided systematic description.
 And
  the cataloger types don't understand why MARC is a really terrible bit
  bucket, especially considering the current environment. All too often the
  two camps don't know to what the other is speaking. MARC must die.
 Long
  live MARC.
 
  --
  Eric Lease Morgan
 



Re: [CODE4LIB] MODS and DCTERMS

2010-05-03 Thread Eric Lease Morgan
On May 3, 2010, at 2:47 PM, Aaron Rubinstein wrote:

 1. MARC the data format -- too rigid, needs to go away
 2. MARC21 bib data -- very detailed, well over 1,000 different data
 elements, some well-coded data (not all); unfortunately trapped in #1
 
 For the sake of my own understanding, I would love an explanation of the 
 distinction between #1 and #2...


Item #1

The first item (#1) is MARC, the data structure -- a container for holding 
various types of bibliographic information. From one of my older publications 
[1]:

  ...the MARC record is a highly structured piece of information.
  It is like a sentence with a subject, predicate, objects,
  separated with commas, semicolons, and one period. In data
  structure language, the MARC record is a hybrid sequential/random
  access record.

  The MARC record is made up of three parts: the leader, the
  directory, the bibliographic data. The leader (or subject in our
  analogy) is always represented by the first 24 characters of each
  record. The numbers and letters within the leader describe the
  record's characteristics. For example, the length of the record
  is in positions 1 to 5. The type of material the record
  represents (authority, bibliographic, holdings, et cetera) is
  signified by the character at position 7. More importantly, the
  characters from positions 13 to 17 represent the base. The base
  is a number pointing to the position in the record where the
  bibliographic information begins.
  
  The directory is the second part of a MARC record. (It is the
  predicate in our analogy.) The directory describes the record's
  bibliographic information with directory entries. Each entry
  lists the types of bibliographic information (items called
  tags), how long the bibliographic information is, and where the
  information is stored in relation to the base. The end of the
  directory and all variable length fields are marked with a
  special character, the ASCII character 30.
  
  The last part of a MARC record is the bibliographic information.
  (It is the object in our sentence analogy.) It is simply all the
  information (and more) on a catalog card. Each part of the
  bibliographic information is separated from the rest with the
  ASCII character 30. Within most of the bibliographic fields are
  indicators and subfields describing in more detail the fields
  themselves. The subfields are delimited from the rest of the
  field with the ASCII character 31.
  
  The end of a MARC record is punctuated with an end-of-record
  mark, ASCII character 29. The ASCII characters 31, 30, and 29
  represent our commas, semicolons, and periods, respectively.

At the time, MARC -- the data structure -- was really cool. Consider the 
environment in 1965. No hard disks. Tape drives instead. Data storage was 
expensive. The medium had to be read from beginning to end. No (or rarely any) 
sequential data access. Thus, the record and field lengths were relatively 
short. (No MARC record can be longer 99,999 characters, and no MARC field can 
be longer than 999 characters.) Remember too the purpose of MARC -- to transmit 
the content of catalog cards. Given the leader, the directory, and the 
bibliographic sections of a MARC record all preceded by pseudo checksums and 
delimited by non-printable ASCII characters, the MARC record -- the data 
structure comes with a plethora of check and balances. Very nice.

Fast forward to the present day. Disk space is cheap. Tapes are not the norm. 
More importantly the wider computing environment uses XML as their data 
structure of choice. If libraries are about sharing information, then we need 
to communicate to them in their language. The language of the Net is XML not 
MARC. Not only is MARC -- the data structure -- stuck on 50 year-old 
technology, but more importantly it is not the language of the people to whom 
we want to share.


Item #2

Our bibliographic data (item #2) is the metadata of the Web. While it is 
important, and it adds a great deal of value, it is not as important as it used 
to be. It too needs to change. Remember, MARC was originally designed to print 
catalog cards. Author. Title. Pagination. Series. Notes. Subject headings. 
Added entries. Looking back, these were relatively simple data elements, but 
what about system numbers? ISBN numbers? Holdings information? Tables of 
contents? Abstracts? Ratings? We have stuffed these things into MARC every 
which way and we call MARC flexible.

More importantly, and as many have said previously, string values in MARC 
records lead to maintenance nightmares. Instead, like a relational database 
model, values need to be described using keys -- pointers -- to the canonical 
values. This makes find/replace operations painless, enables for the use of 
different languages, as well as numerous other advantages.

ISBD is also a pain. Take the following string:

  Kilgour, Frederick Gridley (1914–2006)

There is way too much punctuation going on here. Yes,