Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)
I'll try to find out. Sent from Eric Hellman's iPhone On May 2, 2010, at 4:10 PM, stuart yeates stuart.yea...@vuw.ac.nz wrote: But the interesting use case isn't OpenURL over HTTP, the interesting use case (for me) is OpenURL on a disconnected eBook reader resolving references from one ePub to other ePub content on the same device. Can OpenURL be used like that?
Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)
Here is the API response Umlaut provides to OpenURL requests with standard scholarly formats. This API response is of course to some extent customized to Umlaut's particular context/use cases, it was not neccesarily intended to be any kind of standard -- certainly not with as wide-ranging intended domain as OpenURL's 1.0 intent (which never really caught on), it's targetted at standard actually-existing link resolver use cases in the scholarly environment. But, here you go, live even: http://findit.library.jhu.edu/resolve/api?sid=googleauinit=ABaulast=Milleratitle=Reporting+results+of+cancer+treatmentid=doi:10.1002/1097-0142%2819810101%2947:1%3C207::AID-CNCR2820470134%3E3.0.CO%3B2-6title=Cancervolume=47issue=1date=2006spage=207issn=0008-543X Json is also available. Note that complete results do not neccesarily show up at first, some information is still being loaded in the background. You can refresh the URL to see more results, you'll know when the back-end server has nothing left to give you when completetrue/complete is present. Another XML response with embedded HTML snippets is also available (in both XML and Json): http://findit.library.jhu.edu/resolve/partial_html_sections?sid=googleauinit=ABaulast=Milleratitle=Reporting+results+of+cancer+treatmentid=doi:10.1002/1097-0142%2819810101%2947:1%3C207::AID-CNCR2820470134%3E3.0.CO%3B2-6title=Cancervolume=47issue=1date=2006spage=207issn=0008-543X Ross Singer wrote: On Fri, Apr 30, 2010 at 10:08 AM, Eric Hellman e...@hellman.net wrote: OK, what does the EdSuRoSi spec for OpenURL responses say? Well, I don't think it's up to us and I think it's dependent upon community profile (more than Z39.88 itself), since it would be heavily influenced with what is actually trying to be accomplished. I think the basis of a response could actually be another context object with the 'services' entity containing a list of services/targets that are formatted in some way that is appropriate for the context and the referent entity enhanced with whatever the resolver can add to the puzzle. This could then be taken to another resolver for more services layered on. This is just riffing off the top of my head, of course... -Ross.
Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas
Thanks Ray, I believe it is! A schema listed there is available for requesting with the recordSchema= parameter, yes? Cool, that's exactly what I was looking for. Another question though. I note when looking up schemaInfo... I'm a bit confused by the sort attribute. How could you sort by a schema? What is this attribute actually for? Jonathan Ray Denenberg, Library of Congress wrote: schemaInfo is what you're looking for I think. Look at http://z3950.loc.gov:7090/voyager. Line 74, for example, schemaInfo schema identifier=info:srw/schema/1/marcxml-v1.1 sort=false name=marcxml titleMARCXML/title /schema Is this what you're looking for? --Ray - Original Message - From: Jonathan Rochkind rochk...@jhu.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Friday, April 30, 2010 3:57 PM Subject: [CODE4LIB] SRU/ZeeRex explain question : record schemas This page: http://www.loc.gov/standards/sru/resources/schemas.html says: The Explain document lists the XML schemas for a given database in which records may be transferred. Every schemas is unambiguously identified by a URI and a server may assign a short name, which may or may not be the same as the short name listed in the table below (and may differ from the short name that another server assigns). But perusing the SRU/ZeeRex Explain documentation I've been able to find, I've been unable to find WHERE in the Explain document this information is listed/advertised. Can anyone clue me in?
Re: [CODE4LIB] It's cool to love milk and cookies
me too. On Sun, May 2, 2010 at 9:23 PM, Rosalyn Metz rosalynm...@gmail.com wrote: I like oreo double stuff. I take one cookie off each sandwich and then take two sides with cream and sandwich them together. Voila. Oreo quadruple stuff. On May 2, 2010 4:05 PM, Michael J. Giarlo leftw...@alumni.rutgers.edu wrote: EMACS -Mike On Sun, May 2, 2010 at 14:12, Mark Pernotto mark.perno...@gmail.com wrote: I like heavy whipp...
Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas
From: Jonathan Rochkind rochk...@jhu.edu Another question though. I note when looking up schemaInfo... I'm a bit confused by the sort attribute. How could you sort by a schema? What is this attribute actually for? Well indulge me, this is best explained by the current OASIS SRU draft. (The current and earlier specs don't do a good job here. But for background if interested: sorting as an SRU function was supported in SRU 1.1 and taken out of version 1.2, replaced by sorting as a function of the query language rather than the protocol. For the OASIS work it's in both. For the current spec at LC, which reflects 1.2, the attribute doesn't even make sense. If you go back to the 1.1 archive it does. Still, the OASIS document treats it more clearly.) See http://www.loc.gov/standards/sru/oasis/sru-2-0-draft-most-current.doc See section 9.1. So essentially, when you sort in SRU, you provide an XPath expression. The XPath expression is meaningful in the context of a schema, but the *record schema* may not be the most meaningful schema for purposes of sorting, there may be another schema more meaningful. So, you have the capability to specify not only a record schema but an auxiliary sort schema. A given schema that an Explain file lists will usually be one that is used as a record schema, but it may also be usable as a sort schema. That's what the sort attribute tells you. --Ray
Re: [CODE4LIB] MODS and DCTERMS
Hi MJ, - for that matter, is there a good example of how to properly serialize DCTERMS for eg. a converted MARC/MODS record in XML (or RDF/XML)? I see, eg. http://dublincore.org/documents/dcq-rdf-xml/ which has been replaced by http://dublincore.org/documents/dc-rdf/ but I'm not sure if the latter obviates the former entirely? Also, the examples at the bottom of the latter don't show, eg. repeated elements or DCMES elements. Do we abandon http://purl.org/dc/elements/1.1/ entirely? This has always been ridiculously confusing! Here's my understanding (though anyone else, please chime in and correct me if I've misunderstood): - With the maturation of the DCMI Abstract Model http://dublincore.org/documents/abstract-model/, new bindings were needed to express features of the model not obvious in the old RDF, XML, and XHTML bindings. - For RDF, http://dublincore.org/documents/dc-rdf/ is stable and fully intended to replace http://dublincore.org/documents/dcq-rdf-xml/. - For XML (the non-RDF sort), the most current document is http://dublincore.org/documents/dc-ds-xml/, though note its status is still (after 18 months) only a proposed recommendation. This document itself replaces a transition document http://dublincore.org/documents/2006/05/29/dc-xml/ from 2006 that never got beyond Working Draft status. To get a stable XML binding, you have to go all the way back to 2003 http://dublincore.org/documents/dc-xml-guidelines/index.shtml, a binding which predates much of the current DCMI Abstract Model. - Many found the 2003 XML binding unsatisfactory in that it prescribed the format for individual dc and dcterms properties, but not a full XML format - that is, there was no DC-sanctioned XML root element for a qualified DC record. (This gets at the very heart of the difference in perspective between RDF and XML, properties and elements, etc., I think, but I digress...) The folks I'm aware of that developed workarounds for this were those sharing QDC over OAI-PMH. I find the UIUC OAI registry http://oai.grainger.uiuc.edu/registry/ helpful for investigations of this sort. A quick glance at their report on Distinct Metadata Schemas used in OAI-PMH data providers http://oai.grainger.uiuc.edu/registry/ListSchemas.asp seems to suggest that CONTENTdm uses this schema for QDC http://epubs.cclrc.ac.uk/xsd/qdc.xsd and DSpace uses this one http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd. The latter doesn't actually define a root element either, but since here a! t least the QDC is inside the wrappers the OAI-PMH response requires it's well-formed. What someone does with that once they get it and unpack it, I don't know, since without a container it won't be well-formed XML. The former goes through several levels of importing other things and eventually ends up importing from an .xsd on the Dublin Core site, but they define a root element themselves along the way. (I think.) - So what does one do? I guess it depends on who your target consumers of this data are. If you're looking to work with more traditional library environments, perhaps those that are using CONTENTdm, etc. the legacy hack-ish format might be the best. (I'm part of an initiative to revitalize the Sheet Music Consortium http://digital.library.ucla.edu/sheetmusic/ and lots of our potential contributors are CONTENTdm users, so I think this is the direction I'm going to take that project.) But if you're wanting to talk to DCMI-style folks, the dc-ds-xml, or more likely the dc-rdf option seems more attractive. I'm afraid I'm not much help with the implementation details of dc-rdf, though. One of the DC mailing list would be, though, I suspect. There are a lot of active members there. Ick, huh? :-) Jenn Jenn Riley Metadata Librarian Digital Library Program Indiana University - Bloomington Wells Library W501 (812) 856-5759 www.dlib.indiana.edu Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com
Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas
This makes some amount of sense, thanks. I actually kind of liked the sorting as part of CQL in SRU 1.2. I see how XPath sorting can be convenient too. But you will leave sorting as part of CQL too in any changes to CQL specs, I hope? I think CQL has a lot of use even outside of SRU proper, so I encourage you to leave it's spec not too tightly coupled to SRU. I think there are at least three ways to sort as part of (different versions of?) SRU now! 1) An actual separate sortKeys query paramater 2) Included in the CQL expression in query, using the sortBy keyword. 3) In draft not finalized, OASIS/SRU 2.0 methods of specifying XPaths for sorting. [Thanks for including the link to the current SRU 2.0 draft, I didn't know that was publically available anywhere, it's not really googlable]. Do I have this right? As SRU 1.2 is the only actual spec I have to work with... am I right that either top-level sortKeys, or embedded in CQL with sortBy would both be legal in SRU 1.2 (whether a given server supports one or both of them is a different question -- but they are both legal to spec, yes?). I'd actually strongly encourage you to leave both of them as legal to spec in SRU 2.0, they make things much simpler to work with (although also less flexible; that's generally the trade-off) then requiring XPath's to be specified. Especially when a corpus being searched may include records in diverse and varied and inconsistent record schemas. Jonathan Ray Denenberg, Library of Congress wrote: From: Jonathan Rochkind rochk...@jhu.edu Another question though. I note when looking up schemaInfo... I'm a bit confused by the sort attribute. How could you sort by a schema? What is this attribute actually for? Well indulge me, this is best explained by the current OASIS SRU draft. (The current and earlier specs don't do a good job here. But for background if interested: sorting as an SRU function was supported in SRU 1.1 and taken out of version 1.2, replaced by sorting as a function of the query language rather than the protocol. For the OASIS work it's in both. For the current spec at LC, which reflects 1.2, the attribute doesn't even make sense. If you go back to the 1.1 archive it does. Still, the OASIS document treats it more clearly.) See http://www.loc.gov/standards/sru/oasis/sru-2-0-draft-most-current.doc See section 9.1. So essentially, when you sort in SRU, you provide an XPath expression. The XPath expression is meaningful in the context of a schema, but the *record schema* may not be the most meaningful schema for purposes of sorting, there may be another schema more meaningful. So, you have the capability to specify not only a record schema but an auxiliary sort schema. A given schema that an Explain file lists will usually be one that is used as a record schema, but it may also be usable as a sort schema. That's what the sort attribute tells you. --Ray
Re: [CODE4LIB] MODS and DCTERMS
I'm still confused about all this stuff too, but I've often see the oai_dc format (for OAI/PMH I think?) used as a 'standard' way to expose simple DC attributes. One thing I was confused about was whether the oai_dc format _required_ the use of the old style DC uri's, or also allowed the use of the DCterms URIs? Anyone know? I kind of think it actually requires the old-style DC uri's, as it was written before dcterms. At least it is one standardized way to expose the old basic DC elements, with a specific XML schema. Jonathan Riley, Jenn wrote: Hi MJ, - for that matter, is there a good example of how to properly serialize DCTERMS for eg. a converted MARC/MODS record in XML (or RDF/XML)? I see, eg. http://dublincore.org/documents/dcq-rdf-xml/ which has been replaced by http://dublincore.org/documents/dc-rdf/ but I'm not sure if the latter obviates the former entirely? Also, the examples at the bottom of the latter don't show, eg. repeated elements or DCMES elements. Do we abandon http://purl.org/dc/elements/1.1/ entirely? This has always been ridiculously confusing! Here's my understanding (though anyone else, please chime in and correct me if I've misunderstood): - With the maturation of the DCMI Abstract Model http://dublincore.org/documents/abstract-model/, new bindings were needed to express features of the model not obvious in the old RDF, XML, and XHTML bindings. - For RDF, http://dublincore.org/documents/dc-rdf/ is stable and fully intended to replace http://dublincore.org/documents/dcq-rdf-xml/. - For XML (the non-RDF sort), the most current document is http://dublincore.org/documents/dc-ds-xml/, though note its status is still (after 18 months) only a proposed recommendation. This document itself replaces a transition document http://dublincore.org/documents/2006/05/29/dc-xml/ from 2006 that never got beyond Working Draft status. To get a stable XML binding, you have to go all the way back to 2003 http://dublincore.org/documents/dc-xml-guidelines/index.shtml, a binding which predates much of the current DCMI Abstract Model. - Many found the 2003 XML binding unsatisfactory in that it prescribed the format for individual dc and dcterms properties, but not a full XML format - that is, there was no DC-sanctioned XML root element for a qualified DC record. (This gets at the very heart of the difference in perspective between RDF and XML, properties and elements, etc., I think, but I digress...) The folks I'm aware of that developed workarounds for this were those sharing QDC over OAI-PMH. I find the UIUC OAI registry http://oai.grainger.uiuc.edu/registry/ helpful for investigations of this sort. A quick glance at their report on Distinct Metadata Schemas used in OAI-PMH data providers http://oai.grainger.uiuc.edu/registry/ListSchemas.asp seems to suggest that CONTENTdm uses this schema for QDC http://epubs.cclrc.ac.uk/xsd/qdc.xsd and DSpace uses this one http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd. The latter doesn't actually define a root element either, but since here! a! t least the QDC is inside the wrappers the OAI-PMH response requires it's well-formed. What someone does with that once they get it and unpack it, I don't know, since without a container it won't be well-formed XML. The former goes through several levels of importing other things and eventually ends up importing from an .xsd on the Dublin Core site, but they define a root element themselves along the way. (I think.) - So what does one do? I guess it depends on who your target consumers of this data are. If you're looking to work with more traditional library environments, perhaps those that are using CONTENTdm, etc. the legacy hack-ish format might be the best. (I'm part of an initiative to revitalize the Sheet Music Consortium http://digital.library.ucla.edu/sheetmusic/ and lots of our potential contributors are CONTENTdm users, so I think this is the direction I'm going to take that project.) But if you're wanting to talk to DCMI-style folks, the dc-ds-xml, or more likely the dc-rdf option seems more attractive. I'm afraid I'm not much help with the implementation details of dc-rdf, though. One of the DC mailing list would be, though, I suspect. There are a lot of active members there. Ick, huh? :-) Jenn Jenn Riley Metadata Librarian Digital Library Program Indiana University - Bloomington Wells Library W501 (812) 856-5759 www.dlib.indiana.edu Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com
[CODE4LIB] SRU/ZeeRex explain question : CQL version?
Ah, I think I was wrong below. I must have been looking at different versions of the SRU spec without realizing it. SRU 1.1 includes a sortKeys parameter, and CQL 1.1 does not include a sortBy clause. SRU 1.2 does NOT include a sortKeys parameter, and CQL 1.2 does include a sortBy clause. Okay in the SRU/ZeeRex explain document, how do you advertise which version of CQL you support, 1.1 or 1.2? Or is this just implied by which version of SRU you support, 1.1 or 1.2? How do you advertise THAT in an SRU/ZeeRex explain? Jonathan Jonathan Rochkind wrote: I think there are at least three ways to sort as part of (different versions of?) SRU now! 1) An actual separate sortKeys query paramater 2) Included in the CQL expression in query, using the sortBy keyword. 3) In draft not finalized, OASIS/SRU 2.0 methods of specifying XPaths for sorting. [Thanks for including the link to the current SRU 2.0 draft, I didn't know that was publically available anywhere, it's not really googlable].
Re: [CODE4LIB] Handling non-Unicode characters (was: Unicode persistence)
Hi Stuart, These have been included because they are in widespread use in a current written culture. The problems I personally have are down to characters used by a single publisher in a handful of books more than a hundred years ago. Such characters are explicitly excluded from Unicode. In the early period of the standardisation of the Māori language there were several competing ideas of what to use as a character set. One of those included a 'wh' ligature as a character. Several works were printed using this ligature. This ligature does not qualify for inclusion in Unicode. That is a matter of discussion. If you do not call it 'ligature' chances are higher to get it included. To see how we handle the text, see: http://www.nzetc.org/tm/scholarly/tei-Auc1911NgaM-t1-body-d4.html The underlying representation is TEI/XML, which has a mechanism to handle such glyphs. The things I'm still unhappy with are: * getting reasonable results when users cut-n-paste the text/image HTML combination to some other application * some browsers still like line-breaking on images in the middle of words That's interesting and reminds me on the treatment of mathematical formula in journal titels which mostly end up as ugly images. In Unicode you are allowed to assign private characters http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Private_use_characters The U+200D ZERO WIDTH JOINER could also be used but most browsers will not support it - you need a font that supports your character anyway. http://blogs.msdn.com/michkap/archive/2006/02/15/532394.aspx In summary: Unicode is just a subset of all characters which have been used for written communication and whether a character gets included depends not only on objective properties but on lobbying and other circumstances. The deeper you dig the more nasty Unicode gets - as all complex formats and standards. Cheers Jakob P.S: Michael Kaplan's blog also contains a funny article about emoji: http://blogs.msdn.com/michkap/archive/2010/04/27/10002948.aspx -- Jakob Voß jakob.v...@gbv.de, skype: nichtich Verbundzentrale des GBV (VZG) / Common Library Network Platz der Goettinger Sieben 1, 37073 Göttingen, Germany +49 (0)551 39-10242, http://www.gbv.de
Re: [CODE4LIB] It's cool to love milk and cookies
You know, there are some of us who are milk intolerant on this mailing list. And emacs intolerant, too. (although, I did use 'ee' as my editor in elm, but elm took too long to support MIME, so I switched to pine, with their pico default editor, but I don't use any of those I mentioned for coding, even though I am in pico/pine right now, as I still haven't switched to alpine or mutt) -Joe
Re: [CODE4LIB] It's cool to love milk and cookies
But is there a NISO standard for this? On Fri, Apr 30, 2010 at 7:13 PM, Simon Spero s...@unc.edu wrote: I like chocolate milk.
Re: [CODE4LIB] It's cool to love milk and cookies
C-u 2 double-stuff Aaron On 5/2/2010 9:23 PM, Rosalyn Metz wrote: I like oreo double stuff. I take one cookie off each sandwich and then take two sides with cream and sandwich them together. Voila. Oreo quadruple stuff. On May 2, 2010 4:05 PM, Michael J. Giarloleftw...@alumni.rutgers.edu wrote: EMACS -Mike On Sun, May 2, 2010 at 14:12, Mark Pernottomark.perno...@gmail.com wrote: I like heavy whipp...
Re: [CODE4LIB] It's cool to love milk and cookies
I believe there is an organization called NABISCO that is working on one. --jay On Mon, May 3, 2010 at 10:40 AM, Ross Singer rossfsin...@gmail.com wrote: But is there a NISO standard for this? On Fri, Apr 30, 2010 at 7:13 PM, Simon Spero s...@unc.edu wrote: I like chocolate milk.
Re: [CODE4LIB] Handling non-Unicode characters (was: Unicode persistence)
Hmm, you could theoretically assign chars in the private unicode area to the chars you need -- but then have your application replace those chars by small images on rendering/display. This seems as clean a solution as you are likely to find. Your TEI solution still requires chars-as-images for these unusual chars, right? So this is no better with regard to copying-and-pasting, browser display, and general interoperability than your TEI solution, but no worse either -- it's pretty much the same thing. But it may be better in terms of those considerations for chars that actually ARE currently unicode codepoints. If any of your private chars later become non-private unicode codepoints, you could always globally replace your private codepoints with the new standard ones. With 137K private codepoints available, you _probably_ wouldn't run out. I think. You could try standardizing these private codepoints among people in similar contexts/communities to you and your needs -- it looks like there are several existing efforts to document shared uses of private codepoints for chars that do not have official unicode codepoints. They are mentioned in the wikipedia article. [Reading that wikipedia article taught me something new I didn't know about Marc21 and unicode too -- a topic generally on top of my pile these days -- The MARC 21 standard uses the [Private Use Area] to encode East Asian characters present in MARC-8 that have no Unicode encoding. Who knew? ] Jonathan Jakob Voss wrote: Hi Stuart, These have been included because they are in widespread use in a current written culture. The problems I personally have are down to characters used by a single publisher in a handful of books more than a hundred years ago. Such characters are explicitly excluded from Unicode. In the early period of the standardisation of the Māori language there were several competing ideas of what to use as a character set. One of those included a 'wh' ligature as a character. Several works were printed using this ligature. This ligature does not qualify for inclusion in Unicode. That is a matter of discussion. If you do not call it 'ligature' chances are higher to get it included. To see how we handle the text, see: http://www.nzetc.org/tm/scholarly/tei-Auc1911NgaM-t1-body-d4.html The underlying representation is TEI/XML, which has a mechanism to handle such glyphs. The things I'm still unhappy with are: * getting reasonable results when users cut-n-paste the text/image HTML combination to some other application * some browsers still like line-breaking on images in the middle of words That's interesting and reminds me on the treatment of mathematical formula in journal titels which mostly end up as ugly images. In Unicode you are allowed to assign private characters http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Private_use_characters The U+200D ZERO WIDTH JOINER could also be used but most browsers will not support it - you need a font that supports your character anyway. http://blogs.msdn.com/michkap/archive/2006/02/15/532394.aspx In summary: Unicode is just a subset of all characters which have been used for written communication and whether a character gets included depends not only on objective properties but on lobbying and other circumstances. The deeper you dig the more nasty Unicode gets - as all complex formats and standards. Cheers Jakob P.S: Michael Kaplan's blog also contains a funny article about emoji: http://blogs.msdn.com/michkap/archive/2010/04/27/10002948.aspx
[CODE4LIB] [job posting] Systems Programmer, University of Michigan Library IT
The University of Michigan Library is looking for a talented, resourceful systems programmer to develop and maintain software systems. A principal activity at the library is the development of a massive digital archiving infrastructure to support our scanning partnership with Google; the archive currently contains nearly 6 million items (220 TB) and is projected to grow to over 10 million items (400 TB) over the duration of the project. Programming projects will initially consist of enhancing the systems that receive and manage images from Google (including substantial work with validating incoming data and diagnosing data problems), large-scale transformation of textual and image data, designing/developing core digital library infrastructure, and monitoring reliability and performance of services. Projects may include server and storage administration, depending on candidate interest and ability. Other tasks will vary but include, for example, preparing documentation and monitoring technology trends. BACKGROUND: The Library Information Technology (LIT) division provides comprehensive technology support and guidance for the University Library system, including hosting digital library collections, coordinating electronic publishing initiatives, and supporting traditional library services (circulation of materials and management of metadata). The Core Services unit of LIT concentrates on server infrastructure, systems integration, and automation of workflows for the library system. Core Services undertakes projects in a number of technology areas, including (for example) server deployment and administration, automation, access control systems used daily by the University community, and distributed systems that manage the flow of millions of scanned page images per week. Core Services operates a growing server infrastructure based primarily on Linux, but partially on Solaris, consisting of approximately 80 servers and over 800 TB of storage spread across three data centers. DEPARTMENT QUALIFICATIONS: Minimum: Bachelors degree in computer science or an equivalent combination of education and experience; demonstrated programming abilities in any applicable language; strong analytical and troubleshooting skills; excellent verbal and written communication skills. Desired: Demonstrated expertise with DAS, NAS, and SAN storage systems; demonstrated experience in Linux/Solaris administration; demonstrated experience in database administration; demonstrated experience with developing XSLT transformations. NOTE: This is a 2-year term position. NOTE: Salary dependent on education and previous relevant experience. TO APPLY: Apply online by Monday, May 17 using the University of Michigan Jobs website at http://www.umich.edu/jobs . This position is posted as number 39327, and can be found by searching for the keyword google.
Re: [CODE4LIB] MODS and DCTERMS
Out of curiosity, what is your use case for turning this into DC? That might help those of us that are struggling to figure out where to start with trying to help you with an answer. -Ross. On Mon, May 3, 2010 at 11:46 AM, MJ Suhonos m...@suhonos.ca wrote: Thanks for your comments, guys. I was beginning to think the lack of response indicated that I'd asked something either heretical or painfully obvious. :-) That's my understanding as well. oai_dc predates the defining of the 15 legacy DC properties in the dcterms namespace, and it's my guess nobody saw a reason to update the oai_dc definition after this happened. This is at least part of my use case — we do a lot of work with OAI on both ends, and oai_dc is pretty limited due to the original 15 elements. My thinking at this point is that there's no reason we couldn't define something like oai_dcterms and use the full QDC set based on the updated profile. Right? FWIW, I'm not limited to any legacy ties; in fact, my project is aimed at pushing the newer, DC-sanctioned ideas forward, so I suspect in my case using an XML serialization that validates against http://purl.org/dc/terms/ is probably sufficient (whether that's RDF or not doesn't matter at this point). So, back to the other part of the question: has anybody seen a MODS — DCTERMS crosswalk in the wild? It looks like there's a lot of similarity between the two, but before I go too deep down that rabbit hole, I'd like to make sure someone else hasn't already experienced that, erm, joy. MJ
Re: [CODE4LIB] MODS and DCTERMS
dcterms so so terribly lossy that it would be a shame to reduce MARC to it. This is *precisely* the other half of my rationale — a shame? Why? If MARC is the mind prison that some purport it to be, then let's see what a system built devoid of MARC, but based on the best alternative we have looks like. That may well *not* be DCTERMS, but I do like the DCAM model, and there are plenty of non-library systems out there that speak simple DC (OAI-PMH is one example from this thread alone). Being conceptually RDF-compatible is just a bonus for me. This would be an incentive for them to at least consider implementing DCTERMS, which may be terribly lossy compared to MARC, but is a huge increase in expressivity compared to simple DC. Integrating MARC-based records and DC-based records from OAI sources in a single database could be a useful thing to play with. What we need, ASAP, is a triple form of MARC (and I know some folks have experimented with this...) and a translate from MARC to the RDA elements that have been registered in RDF. However, I hear that JSC is going to be adding more detail to the RDA elements so that could mean changes coming down the pike. I am interested in working on MARC as triples, which I see as a transformation format. I have a database of MARC elements that might be a crude basis for this. This seems like it's looking to accomplish different goals than I am, but obviously if there's a MARC-as-triples intermediary that's workable *today* then I'd be happy to use that instead. But I wonder: how navigable is it by people who don't understand MARC? How much loss is potentially involved? QDC basically represents the same things has dcterms, so you can probably just take the existing XSLT and hack on it until it until it represents something that looks more like dcterms than qdc. Yeah, that might be easier than mapping from MODS, though I'll have to see how much I can look at a MARC-based XSLT before my brain melts. Hopefully it wouldn't take *too* much work. That won't address of the issue of breaking up the MARC into individual resources, however. You mention that you are looking for the short hop to RDF, but this is just going to give you a big pile of literals for things like creator/contributor/subject, etc. I'm not really sure what the win would be, there. Well, a MARC-as-triples approach would suffer from the same problem just as much, at least initially. I think the issue of converting literals into URIs is an important second step, but let's get the literals into a workable format first. I should clarify that my ultimate goal isn't to find a magical easy way to RDF, but rather to try to realize a way for libraries to get their data into a format that others are able and willing to play with. I'm betting on the notion that the majority of (presumably non-librarian) users would rather have incomplete data in a format that they can understand and manipulate, rather than have to learn MARC. I certainly would, and I'm a librarian (though probably a poor one because I don't understand or highly value MARC). Naive? Heretical? Probably. But worth a shot, I think. MJ
Re: [CODE4LIB] MODS and DCTERMS
NB: When Karen Coyle, Eric Morgan, and Roy Tennant all reply to your thread within half an hour of each other, you know you've hit the big time. Time to retire young I think. That would be Eric *Lease* Morgan — oh my god, you're right! I'm already losing data! It *is* insidious! I repent! MJ
Re: [CODE4LIB] MODS and DCTERMS
On 5/3/2010 1:55 PM, Karen Coyle wrote: 1. MARC the data format -- too rigid, needs to go away 2. MARC21 bib data -- very detailed, well over 1,000 different data elements, some well-coded data (not all); unfortunately trapped in #1 For the sake of my own understanding, I would love an explanation of the distinction between #1 and #2... Re: #2, how is bibliographic data encoded in MARC any different than bibliographic data encoded in some other format? Without the encoding format, you just have a pile of strings, right? I agree that we have lots of rich bibliographic data encoded in MARC and it is an exciting possibility to move it out of MARC into other, more flexible formats. Why, then, do we need to migrate the 'elements' of the encoding format as well? Taking one look at MARCXML makes it clear that the structure of MARC is not well suited to contemporary, *interoperable*, data formats. Is there something specific to MARC that is not potentially covered by MODS/DCTERMS/BIBO/??? that I'm missing? Thanks, Aaron
Re: [CODE4LIB] MODS and DCTERMS
On Mon, May 3, 2010 at 2:40 PM, MJ Suhonos m...@suhonos.ca wrote: Yes, even to me as a librarian but not a cataloguer, many (most?) of these elements seem like overkill. I have no doubt there is an edge-case for having this fine level of descriptive detail, but I wonder: a) what proportion of records have this level of description b) what kind of (or how much) user access justifies the effort in creating and preserving it On many levels, I agree. Or I wish I could. If you look at a business model like Amazon, for example, it's easy to imagine that their overriding goal is, Make the easy-to-find stuff ridiculously easy to find. The revenue they get from someone finding an edge-case book is exactly the same as the revenue they get from someone buying Harry Potter. The ROI easy to think about. But I work in an academic library. In a lot of ways, our *primary audience* is some grad student 12 years from now who needs one trivial piece of crap to make it all come together in her head. I know we have thousands of books that have never been looked at, but computing the ROI on someone being able to see them some day is difficult. Maybe it's zero. Maybe not. We just can't tell. Now, none of this is to say that MARC/AACR2 is necessarily the best (or even a good) way to go about making these works findable. I'm just saying that evaluating the edge cases in terms of user access are a complicated business. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MODS and DCTERMS
Although I agree with Roy's suggestion that librarians not gloat about our metadata, the notion that the value of a data element can be elicited from the frequency of its use in the overall domain of library materials is misleading and contrary to the report Roy cites. The sub-section of the very useful and informative OCLC report that Roy cites is very good on this point. Section 2. MARC Tag Usage in WorldCat by Karen Smith-Yoshimura clearly lays out the data in the context of WorldCat and the cataloging practice of the OCLC members. Library holdings are dominated by texts and in terms of titles cataloged texts are dominated by books. This preponderance of books tilts the ratios of use per individual data elements. Many data elements pertain to either a specific form of material, manuscripts, for instance. Others pertain to specific content, musical notation, for instance. Some pertain to both, manuscript scores, for instance. Within the total aggregate of library materials, data elements that are specific per material or content do not rise in usage rates to anything near 20% of the aggregate total of titles. Yet these elements are necessary or valuable to those wishing to discover and use the materials, and when one recalls that 1% use rates in WorldCat equal about 1,000,000 titles the usefulness of many MARC data elements can be seen as widespread. According to the report, 69 MARC tags occur in more than 1% of the records in WorldCat. That is quite a few more than the Roy's 11, but even accounting for Karen's data elements being equivalent to the number of MARC sub-fields this is far fewer than the 1,000 data elements available to a cataloger in MARC. Matthew Beacom By the way, the descriptive fields used in more than 20% of the MARC records in WorldCat are: 245 Title statement 100% 260 Imprint statement 96% 300 Physical description 91% 100 Main entry - personal name 61% 650 Subject added entry - topical term 46% 500 General note 44% 700 Added entry - personal name 28% They answer, more or less, a few basic questions a user might have about the material: What is it called? Who made it? When was it made? How big is it? What is it about? Answers to the question, How can I get it? are usually given in the associated MARC holdings record. -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Roy Tennant Sent: Monday, May 03, 2010 2:15 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MODS and DCTERMS I would even argue with the statement very detailed, well over 1,000 different data elements, some well-coded data (not all). There are only 11 (yes, eleven) MARC fields that appear in 20% or more of MARC records currently in WorldCat[1], and at least three of those elements are control numbers or other elements that contribute nothing to actual description. I would say overall that we would do well to not gloat about our metadata until we've reviewed the facts on the ground. Luckily, now we can. Roy [1] http://www.oclc.org/research/publications/library/2010/2010-06.pdf On Mon, May 3, 2010 at 11:03 AM, Eric Lease Morgan emor...@nd.edu wrote: On May 3, 2010, at 1:55 PM, Karen Coyle wrote: 1. MARC the data format -- too rigid, needs to go away 2. MARC21 bib data -- very detailed, well over 1,000 different data elements, some well-coded data (not all); unfortunately trapped in #1 The differences between the two points enumerated above, IMHO, seem to be the at the heart of the never-ending debate between computer types and cataloger types when it comes to library metadata. The non-library computer types don't appreciate the value of human-aided systematic description. And the cataloger types don't understand why MARC is a really terrible bit bucket, especially considering the current environment. All too often the two camps don't know to what the other is speaking. MARC must die. Long live MARC. -- Eric Lease Morgan
Re: [CODE4LIB] A call for your OPAC (or other system) statistics! (Browse interfaces)
The stats reported in this paper might help: http://homes.ukoln.ac.uk/~kg249/publ/RenardusFinal.pdf -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Bill Dueber Sent: 03 May 2010 19:09 To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] A call for your OPAC (or other system) statistics! (Browse interfaces) I got email from a person today saying, and I quote, I must say that [the lack of a browse interface] come as a shock (*which interface cannot browse??*) [Emphasis mine] Here, a browse interface is one where you can get a giant list of all the titles/authors/subjects whatever -- a view on the data devoid of any searching. Will those of you out there with browse interfaces in your system take a couple minutes to send along a guesstimate of what percentage of patron sessions involve their use? [Note that for right now, I'm excluding type-ahead search boxes although there's an obvious and, in my mind, strong argument to be made that they're substantially similar for many types of data] We don't have a browse interface on our (VuFind) OPAC right now. But in the interest of paying it forward, I can tell you that in Mirlyn, our OPAC, has numbers like this: Pct of Mirlyn sessions, Feb/March/April 2010, which included at least one basic search and also: Go to full record view 46% (we put a lot of info in search results) Select/favorite an item 15% Add a facet:13% Export record(s) to email/refworks/RIS/etc. 3.4% Send to phone (sms) 0.21% Click on faq/help/AskUs in footer0.17% (324 total) Based on 187,784 sessions, 2010.02.01 to 2010.04.31 So...anyone out there able to tell me anything about browse interfaces? -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] A call for your OPAC (or other system) statistics! (Browse interfaces)
Bill, Here are relative percentages for our Horizon catalog, based on our 2008-2009 annual report: Browse Searches 76.2% Keyword Searches20.9% Mulit-index Searches2.9% That interface presents a browse search box before a keyword search box, so browses are encouraged by the UI. That said, we did a study with our graduate students this year and they rely on browse searches for some of their academic work. One is the use of subject and author browses, which lets the student feel confident that they have been exhaustive in their searching in their area of research. This can possibly be accommodated in other ways. In addition to known-item searching, our grad students also use title browse to be confident that we do _not_ own something. In our relevance-ranked interface, sometimes the scholar may blame relevance ranking for hiding a title from them which we don't actually own. It's an understandable reaction. -Tod Tod Olson t...@uchicago.edu Systems Librarian University of Chicago Library On May 3, 2010, at 1:08 PM, Bill Dueber wrote: I got email from a person today saying, and I quote, I must say that [the lack of a browse interface] come as a shock (*which interface cannot browse??*) [Emphasis mine] Here, a browse interface is one where you can get a giant list of all the titles/authors/subjects whatever -- a view on the data devoid of any searching. Will those of you out there with browse interfaces in your system take a couple minutes to send along a guesstimate of what percentage of patron sessions involve their use? [Note that for right now, I'm excluding type-ahead search boxes although there's an obvious and, in my mind, strong argument to be made that they're substantially similar for many types of data] We don't have a browse interface on our (VuFind) OPAC right now. But in the interest of paying it forward, I can tell you that in Mirlyn, our OPAC, has numbers like this: Pct of Mirlyn sessions, Feb/March/April 2010, which included at least one basic search and also: Go to full record view 46% (we put a lot of info in search results) Select/favorite an item 15% Add a facet:13% Export record(s) to email/refworks/RIS/etc. 3.4% Send to phone (sms) 0.21% Click on faq/help/AskUs in footer0.17% (324 total) Based on 187,784 sessions, 2010.02.01 to 2010.04.31 So...anyone out there able to tell me anything about browse interfaces? -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MODS and DCTERMS
Quoting Beacom, Matthew matthew.bea...@yale.edu: According to the report, 69 MARC tags occur in more than 1% of the records in WorldCat. That is quite a few more than the Roy's 11, but even accounting for Karen's data elements being equivalent to the number of MARC sub-fields this is far fewer than the 1,000 data elements available to a cataloger in MARC. So much depends on how you count things, so at the http://kcoyle.net/rda/ site I have put two MARC-related files. The first is just a list of elements (variable subfields) in alpha order with duplicates removed. Yes, I realize how imperfect this is, and that we will need to look beyond names to *meaning* of elements to determine what we really have. This file does not include indicators, and sometimes indicators really do create a separate element, like when person name becomes Family based on its indicator. That file has over 560 entries. The next file probably needs some more thought, but it is a list of the variable field indicators and subfields, leaving in subfields that are duplicated in different fields. I removed some of the numeric subfields that didn't seem to result in an actual elements (2, 3, 5, 6, 8), but could be wrong about that. I also did not include indicators that are = Undefined. We can debate whether a personal name in an added entry is the same element as a personal name in a subject heading, and similarly for the various places where geographic names are used, titles, etc etc etc. This is the analysis that is needed to reduce MARC21 to a cleaner set of data elements. That file has 1421 entries. Neither of these contains any of the fixed field elements (many of which, IMO, should replace textual elements now carried in MARC21). When I looked at the fixed fields (and this is reported at http://futurelib.pbworks.com/Data+and+Studies), I came up with this count of *unique* fixed field elements (each with multiple values): 008 - 58 007 - 55 Each one of these should become a controlled value list in a SemWeb implementation of MARC. RDA appears to have a total of 68 defined value lists, but I don't believe that those include ones defined elsewhere, such as languages, country codes, etc. kc p.s. linked from that same page is the file I am using for this analysis, in CSV format, if anyone else wants to play with it. I have tried to keep it up to date with MARBI proposals. Matthew Beacom By the way, the descriptive fields used in more than 20% of the MARC records in WorldCat are: 245 Title statement 100% 260 Imprint statement 96% 300 Physical description 91% 100 Main entry - personal name 61% 650 Subject added entry - topical term 46% 500 General note 44% 700 Added entry - personal name 28% They answer, more or less, a few basic questions a user might have about the material: What is it called? Who made it? When was it made? How big is it? What is it about? Answers to the question, How can I get it? are usually given in the associated MARC holdings record. -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Roy Tennant Sent: Monday, May 03, 2010 2:15 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MODS and DCTERMS I would even argue with the statement very detailed, well over 1,000 different data elements, some well-coded data (not all). There are only 11 (yes, eleven) MARC fields that appear in 20% or more of MARC records currently in WorldCat[1], and at least three of those elements are control numbers or other elements that contribute nothing to actual description. I would say overall that we would do well to not gloat about our metadata until we've reviewed the facts on the ground. Luckily, now we can. Roy [1] http://www.oclc.org/research/publications/library/2010/2010-06.pdf On Mon, May 3, 2010 at 11:03 AM, Eric Lease Morgan emor...@nd.edu wrote: On May 3, 2010, at 1:55 PM, Karen Coyle wrote: 1. MARC the data format -- too rigid, needs to go away 2. MARC21 bib data -- very detailed, well over 1,000 different data elements, some well-coded data (not all); unfortunately trapped in #1 The differences between the two points enumerated above, IMHO, seem to be the at the heart of the never-ending debate between computer types and cataloger types when it comes to library metadata. The non-library computer types don't appreciate the value of human-aided systematic description. And the cataloger types don't understand why MARC is a really terrible bit bucket, especially considering the current environment. All too often the two camps don't know to what the other is speaking. MARC must die. Long live MARC. -- Eric Lease Morgan -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 begin_of_the_skype_highlighting 1-510-435-8234 end_of_the_skype_highlighting skype: kcoylenet
Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)
Quoting Jakob Voss jakob.v...@gbv.de: I bet there are several reasons why OpenURL failed in some way but I think one reason is that SFX got sold to Ex Libris. Afterwards there was no interest of Ex Libris to get a simple clean standard and most libraries ended up in buying a black box with an OpenURL label on it - instead of developing they own systems based on a common standard. I bet you can track most bad library standards to commercial vendors. I don't trust any standard without open specification and a reusable Open Source reference implementation. For what it's worth, that does not coincide with my experience. kc -- Karen Coyle kco...@kcoyle.net http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet
Re: [CODE4LIB] SRU/ZeeRex explain question : record schemas
From: Jonathan Rochkind rochk...@jhu.edu But you will leave sorting as part of CQL too in any changes to CQL specs, I hope? I think CQL has a lot of use even outside of SRU proper, so I encourage you to leave it's spec not too tightly coupled to SRU. The OASIS TC firmly supports this approach (and by firmly I mean 100%) so the only way this could get changed is via public comment. I think there are at least three ways to sort as part of (different versions of?) SRU now! 1) An actual separate sortKeys query paramater 2) Included in the CQL expression in query, using the sortBy keyword. 3) In draft not finalized, OASIS/SRU 2.0 methods of specifying XPaths for sorting. [Thanks for including the link to the current SRU 2.0 draft, I didn't know that was publically available anywhere, it's not really googlable]. As you corrected yourself in a subsequent message: Ah, I think I was wrong below. I must have been looking at different versions of the SRU spec without realizing it. SRU 1.1 includes a sortKeys parameter, and CQL 1.1 does not include a sortBy clause. SRU 1.2 does NOT include a sortKeys parameter, and CQL 1.2 does include a sortBy clause. Yes, that's correct. Do I have this right? As SRU 1.2 is the only actual spec I have to work with... am I right that either top-level sortKeys, or embedded in CQL with sortBy would both be legal in SRU 1.2 No. Legal in 2.0 - the OASIS version, not legal in 1.2. In 1.2 it is not legal to have a sort parameter in the request. OASIS is standardizing SRU and CQL loosely coupled that is, SRU can use other query languages and CQL may be invoked by other protocols, but they are generally oriented towards being used together. But since SRU may be used with a query language that might not have sort capability, the TC felt it necessary to include sorting as part of the protocol. Conversely since CQL may be used by a protocol that doesn't support sorting, similarly CQL should support sorting. There is a section in the draft standard that discusses what to do if a request has conflicting sort specifications. --Ray
Re: [CODE4LIB] MODS and DCTERMS
Thanks, Matthew, for a much more nuanced and accurate depiction of the data. I would encourage anyone interested in this topic to spend some time with this report, which was one result of a great deal of work by many people in research institutions around the world. The findings and recommendations are well worth your time. Roy On Mon, May 3, 2010 at 11:55 AM, Beacom, Matthew matthew.bea...@yale.eduwrote: Although I agree with Roy's suggestion that librarians not gloat about our metadata, the notion that the value of a data element can be elicited from the frequency of its use in the overall domain of library materials is misleading and contrary to the report Roy cites. The sub-section of the very useful and informative OCLC report that Roy cites is very good on this point. Section 2. MARC Tag Usage in WorldCat by Karen Smith-Yoshimura clearly lays out the data in the context of WorldCat and the cataloging practice of the OCLC members. Library holdings are dominated by texts and in terms of titles cataloged texts are dominated by books. This preponderance of books tilts the ratios of use per individual data elements. Many data elements pertain to either a specific form of material, manuscripts, for instance. Others pertain to specific content, musical notation, for instance. Some pertain to both, manuscript scores, for instance. Within the total aggregate of library materials, data elements that are specific per material or content do not rise in usage rates to anything near 20% of the aggregate total of titles. Yet these elements are necessary or valuable to those wishing to discover and use the materials, and when one recalls that 1% use rates in WorldCat equal about 1,000,000 titles the usefulness of many MARC data elements can be seen as widespread. According to the report, 69 MARC tags occur in more than 1% of the records in WorldCat. That is quite a few more than the Roy's 11, but even accounting for Karen's data elements being equivalent to the number of MARC sub-fields this is far fewer than the 1,000 data elements available to a cataloger in MARC. Matthew Beacom By the way, the descriptive fields used in more than 20% of the MARC records in WorldCat are: 245 Title statement 100% 260 Imprint statement 96% 300 Physical description 91% 100 Main entry - personal name 61% 650 Subject added entry - topical term 46% 500 General note 44% 700 Added entry - personal name 28% They answer, more or less, a few basic questions a user might have about the material: What is it called? Who made it? When was it made? How big is it? What is it about? Answers to the question, How can I get it? are usually given in the associated MARC holdings record. -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Roy Tennant Sent: Monday, May 03, 2010 2:15 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MODS and DCTERMS I would even argue with the statement very detailed, well over 1,000 different data elements, some well-coded data (not all). There are only 11 (yes, eleven) MARC fields that appear in 20% or more of MARC records currently in WorldCat[1], and at least three of those elements are control numbers or other elements that contribute nothing to actual description. I would say overall that we would do well to not gloat about our metadata until we've reviewed the facts on the ground. Luckily, now we can. Roy [1] http://www.oclc.org/research/publications/library/2010/2010-06.pdf On Mon, May 3, 2010 at 11:03 AM, Eric Lease Morgan emor...@nd.edu wrote: On May 3, 2010, at 1:55 PM, Karen Coyle wrote: 1. MARC the data format -- too rigid, needs to go away 2. MARC21 bib data -- very detailed, well over 1,000 different data elements, some well-coded data (not all); unfortunately trapped in #1 The differences between the two points enumerated above, IMHO, seem to be the at the heart of the never-ending debate between computer types and cataloger types when it comes to library metadata. The non-library computer types don't appreciate the value of human-aided systematic description. And the cataloger types don't understand why MARC is a really terrible bit bucket, especially considering the current environment. All too often the two camps don't know to what the other is speaking. MARC must die. Long live MARC. -- Eric Lease Morgan
Re: [CODE4LIB] MODS and DCTERMS
On May 3, 2010, at 2:47 PM, Aaron Rubinstein wrote: 1. MARC the data format -- too rigid, needs to go away 2. MARC21 bib data -- very detailed, well over 1,000 different data elements, some well-coded data (not all); unfortunately trapped in #1 For the sake of my own understanding, I would love an explanation of the distinction between #1 and #2... Item #1 The first item (#1) is MARC, the data structure -- a container for holding various types of bibliographic information. From one of my older publications [1]: ...the MARC record is a highly structured piece of information. It is like a sentence with a subject, predicate, objects, separated with commas, semicolons, and one period. In data structure language, the MARC record is a hybrid sequential/random access record. The MARC record is made up of three parts: the leader, the directory, the bibliographic data. The leader (or subject in our analogy) is always represented by the first 24 characters of each record. The numbers and letters within the leader describe the record's characteristics. For example, the length of the record is in positions 1 to 5. The type of material the record represents (authority, bibliographic, holdings, et cetera) is signified by the character at position 7. More importantly, the characters from positions 13 to 17 represent the base. The base is a number pointing to the position in the record where the bibliographic information begins. The directory is the second part of a MARC record. (It is the predicate in our analogy.) The directory describes the record's bibliographic information with directory entries. Each entry lists the types of bibliographic information (items called tags), how long the bibliographic information is, and where the information is stored in relation to the base. The end of the directory and all variable length fields are marked with a special character, the ASCII character 30. The last part of a MARC record is the bibliographic information. (It is the object in our sentence analogy.) It is simply all the information (and more) on a catalog card. Each part of the bibliographic information is separated from the rest with the ASCII character 30. Within most of the bibliographic fields are indicators and subfields describing in more detail the fields themselves. The subfields are delimited from the rest of the field with the ASCII character 31. The end of a MARC record is punctuated with an end-of-record mark, ASCII character 29. The ASCII characters 31, 30, and 29 represent our commas, semicolons, and periods, respectively. At the time, MARC -- the data structure -- was really cool. Consider the environment in 1965. No hard disks. Tape drives instead. Data storage was expensive. The medium had to be read from beginning to end. No (or rarely any) sequential data access. Thus, the record and field lengths were relatively short. (No MARC record can be longer 99,999 characters, and no MARC field can be longer than 999 characters.) Remember too the purpose of MARC -- to transmit the content of catalog cards. Given the leader, the directory, and the bibliographic sections of a MARC record all preceded by pseudo checksums and delimited by non-printable ASCII characters, the MARC record -- the data structure comes with a plethora of check and balances. Very nice. Fast forward to the present day. Disk space is cheap. Tapes are not the norm. More importantly the wider computing environment uses XML as their data structure of choice. If libraries are about sharing information, then we need to communicate to them in their language. The language of the Net is XML not MARC. Not only is MARC -- the data structure -- stuck on 50 year-old technology, but more importantly it is not the language of the people to whom we want to share. Item #2 Our bibliographic data (item #2) is the metadata of the Web. While it is important, and it adds a great deal of value, it is not as important as it used to be. It too needs to change. Remember, MARC was originally designed to print catalog cards. Author. Title. Pagination. Series. Notes. Subject headings. Added entries. Looking back, these were relatively simple data elements, but what about system numbers? ISBN numbers? Holdings information? Tables of contents? Abstracts? Ratings? We have stuffed these things into MARC every which way and we call MARC flexible. More importantly, and as many have said previously, string values in MARC records lead to maintenance nightmares. Instead, like a relational database model, values need to be described using keys -- pointers -- to the canonical values. This makes find/replace operations painless, enables for the use of different languages, as well as numerous other advantages. ISBD is also a pain. Take the following string: Kilgour, Frederick Gridley (1914–2006) There is way too much punctuation going on here. Yes,
Re: [CODE4LIB] A call for your OPAC (or other system) statistics! (Browse interfaces)
On Mon, May 3, 2010 at 7:10 PM, Bryan Baldus bryan.bal...@quality-books.com wrote: I can't speak for other users (particularly the generic patron user type), but as a cataloger/librarian user, ...and THERE IT IS, ladies and gentlemen. I've started trying to keep a list of IP addresses I *know* are staff and separate out the statistics. The OPAC isn't for the librarians; the ILS client is. If the client sucks so badly that librarians need the OPAC to do our job (as I was told several times during our roll out of vufind), then the solution is to fix the client, or (alternately) build up a workaround for staff. NOT to overload the OPAC. If librarians need specialized tools, let's just build them without some sort of pretense that they're anything but the tiniest blip on the bell curve of patrons. And, BTW, just because you (and you know who you are!) do 8 hours of reference desk work a week doesn't mean you have a hell of a lot more insight. The patrons that self-select to actually speak to a librarian sitting *in the library* are a freakshow themselves, statistically speaking. [Not meaning to imply that Bryan doesn't know the difference between himself and a normal patron; his post makes it clear that he does. I just took the opportunity to rant.] I'm not saying that patrons don't use browse much (that's what I'm trying to determine). But, to borrow from the 2009 code4lib conference, every time a librarian's work habits inform the design of a public-facing application, God kills a kitten. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)
On Mon, May 3, 2010 at 6:34 PM, Karen Coyle li...@kcoyle.net wrote: Quoting Jakob Voss jakob.v...@gbv.de: I bet there are several reasons why OpenURL failed in some way but I think one reason is that SFX got sold to Ex Libris. Afterwards there was no interest of Ex Libris to get a simple clean standard and most libraries ended up in buying a black box with an OpenURL label on it - instead of developing they own systems based on a common standard. I bet you can track most bad library standards to commercial vendors. I don't trust any standard without open specification and a reusable Open Source reference implementation. For what it's worth, that does not coincide with my experience. I'm going to turn this back on Karen and say that much of my pain does come from vendors, but it comes from their shitty data. OpenURL and resolvers would be a much more valuable piece of technology if the vendors would/could get off their collective asses(1) and give us better data. -Bill- (1) By this, of course, I mean if the librarians would grow a pair and demand better data via our contracts -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] A call for your OPAC (or other system) statistics! (Browse interfaces)
When it's actually a reference librarian using it for reference/research tasks, I think it can be a legitimate use case -- so long as you remember that it is representative of only a certain type of expert searcher (not neccesarily even every searcher requiring sophisticated or complex features, just a certain type with certain tasks), which represents a minority of searchers, and don't over-emphasize it's importance beyond it's actual representativeness -- don't sacrifice the needs of the majority of users for a minority. When the tasks are related to cataloging and assigning headings -- absolutely and completely agree with Bill, this is not an appropriate use case for a public interface, I agree. So, Bill, you're still not certain yourself exactly what purposes browse is used for by actual non-librarian searchers, if anything? Jonathan From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Bill Dueber [b...@dueber.com] Sent: Monday, May 03, 2010 8:28 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] A call for your OPAC (or other system) statistics! (Browse interfaces) On Mon, May 3, 2010 at 7:10 PM, Bryan Baldus bryan.bal...@quality-books.com wrote: I can't speak for other users (particularly the generic patron user type), but as a cataloger/librarian user, ...and THERE IT IS, ladies and gentlemen. I've started trying to keep a list of IP addresses I *know* are staff and separate out the statistics. The OPAC isn't for the librarians; the ILS client is. If the client sucks so badly that librarians need the OPAC to do our job (as I was told several times during our roll out of vufind), then the solution is to fix the client, or (alternately) build up a workaround for staff. NOT to overload the OPAC. If librarians need specialized tools, let's just build them without some sort of pretense that they're anything but the tiniest blip on the bell curve of patrons. And, BTW, just because you (and you know who you are!) do 8 hours of reference desk work a week doesn't mean you have a hell of a lot more insight. The patrons that self-select to actually speak to a librarian sitting *in the library* are a freakshow themselves, statistically speaking. [Not meaning to imply that Bryan doesn't know the difference between himself and a normal patron; his post makes it clear that he does. I just took the opportunity to rant.] I'm not saying that patrons don't use browse much (that's what I'm trying to determine). But, to borrow from the 2009 code4lib conference, every time a librarian's work habits inform the design of a public-facing application, God kills a kitten. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] A call for your OPAC (or other system) statistics! (Browse interfaces)
On Mon, May 3, 2010 at 8:39 PM, Jonathan Rochkind rochk...@jhu.edu wrote: So, Bill, you're still not certain yourself exactly what purposes browse is used for by actual non-librarian searchers, if anything? Right. I'm not sure *the extent* to which it's used (data which are necessarily going to be messy and partially driven by how prevalent browse vs search are in the interface), and I certainly don't know what's going through people's heads when they choose to use it (on those occasions when they make a conscious choice to use browse in addition to/instead of search). My attempts to find stuff in the research literature failed me; if anyone has other pointers, I'd love to read them! (If only there was a real librarian around to help poor little me...) -Bill-
Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)
Bill Dueber wrote: if the librarians would grow a pair and demand better data via our contracts While I agree with your overall point, it would have been better made with the gendered phrasing, in my view. cheers stuart -- Stuart Yeates http://www.nzetc.org/ New Zealand Electronic Text Centre http://researcharchive.vuw.ac.nz/ Institutional Repository
Re: [CODE4LIB] OpenURL and DAIA
We are just starting to use DAIA for a small holdings register of journals holdings in connection with Vufind and the new DAIA-Driver in Vufind. Since the holdings register is not a big union-catalog, but rather a simple database in which you simply mark which Journal (ISSN) you have for which periode, we do send the requests by OpenURL, do some ISSN-Mapping and send back DAIA-responses. We will use that in connection with an open and cooperative reference database for nursing literature. DAIA works very fine for us. There should perhaps be an official way to request subsets of holdings and transport some information e.g. about ILL fees in DAIA (probably you can for the later in the limitation tag?). But we work around by combining that with IP-based requests. So we can do crazy stuff like showing institution specific availability in the overview of a search, and showing general availability in the details of a record. I think Jakob created with DAIA a simple an lightweighted solution to a real problem in the library world. Markus Jakob Voss schrieb: Owen wrote: Although part of the problem is that you might want to offer any service on the basis of an OpenURL the major use case is supply of a document (either online or via ILL) - so it strikes me you could look at DAIA http://www.gbv.de/wikis/cls/DAIA_-_Document_Availability_Information_API ? Jakob does this make sense? Just having read Joel Spolsky's article about Architecture Astronauts that Mike pointed to [1] I hesitate to propagate what you can all do with DAIA. But your use case makes sense if you want to offer services provided or mediated by a specific institution (such as a library) with a specific publication. Inspired by your idea to combine OpenURL and DAIA I update the DAIA perl library [2] and hacked a DAIA server that also understands some very limited OpenURL (it only knows books with ISBN): You can look up which library has a specific publication in the GBV library union by its identifier: http://ws.gbv.de/daia/gvk/?id=gvk:ppn:48574418X or by OpenURL http://ws.gbv.de/daia/gvk/?ctx_ver=Z39.88-2004rft_val_fmt=info:ofi/fmt:kev:mtx:bookrft.isbn=0-471-38393-7 Have a look at the simple source code of this script at http://daia.svn.sourceforge.net/viewvc/daia/trunk/daiapm/examples/gvk.pl?view=markup I want to stress that this demo DAIA server does not use the full expression power of DAIA, in fact it does not provide any availability information at at - but you hopefully get the concept. Cheers Jakob [1] http://www.joelonsoftware.com/articles/fog18.html [2] https://sourceforge.net/projects/daia/files/DAIA-0.27.tar.gz