[CODE4LIB] OR09 Workshop / Tools for Repositories - Microsoft Research the Scholarly Information Ecosystem
[Forwarded upon request --ELM] (apologies for cross-postings) Colleagues, Microsoft Research will be hosting a workshop at the Open Repositories '09 meeting in Atlanta, GA on from 1-6pm on Thursday, May 21st (https://or09.library.gatech.edu/workshops.php). A preliminary description below, with a more detailed agenda to be made available shortly. Registration is open now (email us at scho...@microsoft.commailto:scho...@microsoft.com ); we are able to accommodate up to 50 attendees. We hope you will join us! Tools for Repositories - Microsoft Research the Scholarly Information Ecosystem Microsoft External Research strongly supports the process of research and its role in the innovation ecosystem, including developing and supporting efforts in open access, open tools, open technology, and interoperability. We partner with universities, national libraries, publishers, and governmental organizations to help develop tools and services to evolve the scholarly information lifecycle. These projects demonstrate our ongoing work towards producing next- generation documents that increase productivity and empower authors to increase the discoverability and appropriate re-use of their work. This workshop will provide a deep dive into several freely available and open source tools from Microsoft External Research, and will demonstrate how these can help supplement and enhance current repository offerings. Come learn more about how the Microsoft Research tools can help extend the reach and utility of your repository efforts. Each session during the half-day workshop will include a hands-on component so that attendees can gain a deeper technical understanding of the available tool-set, which includes the following resources: * Article Authoring Add-in for Word 2007 o Structured document authoring (based on the NLM-DTD) o Ontology integration and markup o Repository search integration o ORE Resource Map authoring o Article repository submission workflow (via REST and SWORD interfaces) * Microsoft eJournal Service - a hosted peer-review workflow system * Zentity - Our v1.0 research-output repository platform * Research Information Centre - a collaboration space for researchers * Windows Live Machine Translation Service * Document Conversion Service More information on each of these tools can be found at: http://www.microsoft.com/scholarlycomm To register, please email us at scho...@microsoft.commailto:scho...@microsoft.com We look forward to having you join us! -- Lee Dirks Director, Education Scholarly Communication Microsoft Corporation - External Research ldi...@microsoft.com mailto:ldi...@microsoft.com (425) 703-6866 http://microsoft.com/scholarlycomm
Re: [CODE4LIB] registering info: uris?
So hey, I'm nobody wanted to see this thread revived, but I'm hoping you info uri folks can clear something up for me. So I'm trying to gather together a vocabulary of identifiers to unambiguously describe the format of the data you would be getting in a Jangle feed or an UnAPI response (or any other variation on this theme). I have a MODS document and I want *you* to have it too!. Jakob Voss made the (reasonable) suggestion that rather than create yet another identifier or registry to describe these formats, instead it would make sense to use the work that the SRU: http://www.loc.gov/standards/sru/resources/schemas.html or OpenURL: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats communities have already done. Which makes a lot of sense. It would be nice to use the same identifier in Jangle, SRU and OpenURL to say that this is a MARCXML or ONIX record. Except that OpenURL and SRU /already use different info URIs to describe the same things/. info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix What is the rationale for this? How do we keep up? Are they reusable? Which one should be used? Doesn't this pretty horribly undermine the purpose of using info URIs in the first place? Is anybody else interested in working on a way to unambiguously say here is a Dublin Core resource as XML, but it is not OAI DC or this is text/x-vcard, it conforms to vCard 3.0 in a way that we can reuse among all of our various ways of sharing data? Thanks, -Ross.
Re: [CODE4LIB] Recommend book scanner?
At Wed, 29 Apr 2009 13:32:08 -0400, Christine Schwartz wrote: We are looking into buying a book scanner which we'll probably use for archival papers as well--probably something in the $1,000.00 range. Any advice? Most organizations, or at least the big ones, Internet Archive and Google, seem to be using a design based on 2 fixed cameras rather than a tradition scanner type device. Is this what you had in mind? Unfortunately none of these products are cheap. Internet Archive’s Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because it has two very expensive cameras. Google’s data is unavailable. A company called Kirtas also sells what look like very expensive machines of a similar design. On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. I think that these are a real possibility for smaller organizations. The maturity of the software and workflow is problematic, but with Google’s Ocropus OCR software [4] freely available as the heart of a scanning workflow, the possibility is there. Both bkrpr and [3] have software currently available, although in the case of bkrpr at least the software is in the very early stages of development. best, Erik Hetzner 1. http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/ 2. http://bkrpr.org/doku.php 3. http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/ 4. http://code.google.com/p/ocropus/ ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpYI2WLVtxUI.pgp Description: PGP signature
Re: [CODE4LIB] Recommend book scanner?
How good are the two-camera apparatuses for scanning things other than books? The thing about the Google and Kirtas scanners is that they are not particularly recommended for dealing with fragile books or otherwise special collections materials. The University of Virginia Library is still using the single camera method for books as well as manuscripts, photographs, slides, coins, etc. It's a Hasselblad camera with a 45 megapixel Phase One digital back, but significantly out of the $1000 range. I haven't dealt with all the camera hardware you could find, but I have never seen a professional digitization hardware/software suite for as low as a thousand bucks. You could check out i2s's copibook series ( http://www.i2s-bookscanner.com/produits.asp?gamme=1003sX_Menu_selectedID=leftV_1003_MOD), but I have no idea how much they cost; they don't say. Erik's idea of building something custom is an option, but you might not necessarily get consistent quality and production rate. Have you considered partnering with Princeton University's digitization labs? The UVA Health Sciences Library occasionally borrows/trades/buys resources from the university library's digitization services (the health system and university are technically two different entities). For all I know, someone from Princeton University is on this list; I don't know what their resources are and don't presume to speak for them. That's just my idea. Ethan Gruber On Thu, Apr 30, 2009 at 12:49 PM, Erik Hetzner erik.hetz...@ucop.eduwrote: At Wed, 29 Apr 2009 13:32:08 -0400, Christine Schwartz wrote: We are looking into buying a book scanner which we'll probably use for archival papers as well--probably something in the $1,000.00 range. Any advice? Most organizations, or at least the big ones, Internet Archive and Google, seem to be using a design based on 2 fixed cameras rather than a tradition scanner type device. Is this what you had in mind? Unfortunately none of these products are cheap. Internet Archive’s Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because it has two very expensive cameras. Google’s data is unavailable. A company called Kirtas also sells what look like very expensive machines of a similar design. On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. I think that these are a real possibility for smaller organizations. The maturity of the software and workflow is problematic, but with Google’s Ocropus OCR software [4] freely available as the heart of a scanning workflow, the possibility is there. Both bkrpr and [3] have software currently available, although in the case of bkrpr at least the software is in the very early stages of development. best, Erik Hetzner 1. http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/ 2. http://bkrpr.org/doku.php 3. http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/ 4. http://code.google.com/p/ocropus/ ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3
Re: [CODE4LIB] Recommend book scanner?
Erik Hetzner wrote: At Wed, 29 Apr 2009 13:32:08 -0400, Christine Schwartz wrote: We are looking into buying a book scanner which we'll probably use for archival papers as well--probably something in the $1,000.00 range. Any advice? Most organizations, or at least the big ones, Internet Archive and Google, seem to be using a design based on 2 fixed cameras rather than a tradition scanner type device. Is this what you had in mind? This is probably the type of machine that will be needed for books if they need to remain bound throughout the scanning process. For looseleaf materials or for books that can be disbound and are in good condition, you can get inexpensive duplex sheet feeder scanners for a few hundred dollars that might be good enough. Unfortunately none of these products are cheap. Internet Archive’s Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because it has two very expensive cameras. Google’s data is unavailable. A company called Kirtas also sells what look like very expensive machines of a similar design. $15K seems pretty cheap for that kind of scanner; most that I've seen run from the tens of thousands well into the hundreds, depending on the model and features. I don't remember precisely what IA's Scribe stations cost, but I think they were more in the range of $40-60K CAD; it would probably be cheaper in the US, but not that much cheaper, and I suspect that IA gets some sort of bulk discount for buying them by the truckload. The main issues to consider are: - Type of material: is it fragile or not; is it rare; can you afford to damage or destroy a copy during the scanning process; can the items be disbound; what is the minimum and maximum size of item to be scanned; if books are to remain bound, are the bindings tight or are the margins; paper thickness; existence of damage, water spotting, show through, and other defects - Scanning resolution required - Image output (color/greyscale/black and white) and output format (TIFF, JPEG2000, PDF, JPEG). - Throughput requirement. (How much stuff do you have: dozens/hundreds/thousands/millions of pages, and how quickly do you need to get it done: days/weeks/months/years?) - How much technical work can/are you willing to do yourself? Can you invest in substantial post-processing, or do you need to be able to press Go on the scanner and produce a more or less finished product? If so, what sort of metadata, OCR, etc. requirements do you have, if any, in addition to getting the basic image? For some projects, there are suitable desktop scanners available for very little money, and in some cases, using a decent (7 megapixel or higher) digital camera in conjunction with a stand and maybe an image editor like Photoshop (or something free like Irfanview) to crop and deskew afterwards might work just fine, but in other cases, a much more elaborate setup might be needed. -- William Wueppelmann Systems Librarian/Programmer Canadiana.org http://www.canadiana.org
Re: [CODE4LIB] registering info: uris?
From: Ross Singer rossfsin...@gmail.com Except that OpenURL and SRU /already use different info URIs to describe the same things/. info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix What is the rationale for this? None. (Or, whatever rationale there was, historically, should no longer apply.) These should be aligned. Post this to the OpenURL list (and perhaps SRU as well). I'm certainly willing to work to come up with a solution. --Ray
[CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Hello everybody. I apologize for the crossposting, but this is an area that could (potentially) affect every one of these groups. I realize that not everybody will be able to respond to all lists, but... First of all, some back story (Code4Lib subscribers can probably skip ahead): Jangle [1] requires URIs to explicitly declare the format of the data it is transporting (binary marc, marcxml, vcard, DLF simpleAvailability, MODS, EAD, etc.). In the past, it has used it's own URI structure for this (http://jangle.org/vocab/formats#...) but this was always been with the intention of moving out of the jangle.org into a more generic space so it could be used by other initiatives. This same concept came up in UnAPI [2] (I think this thread: http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-March/thread.html#682 discusses it a bit - there is a reference there that it maybe had come up before) although was rejected ultimately in favor of an (optional) approach more in line with how OAI-PMH disambiguates metadata formats. That being said, this page used to try to set sort of convention around the UnAPI formats: http://unapi.stikipad.com/unapi/show/existing+formats But it's now just a squatter page. Jakob Voss pointed out that SRU has a schema registry and that it would make sense to coordinate with this rather than mint new URIs for things that have already been defined there: http://www.loc.gov/standards/sru/resources/schemas.html This, of course, made a lot of sense. It also made me realize that OpenURL *also* has a registry of metadata formats: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats The problem here is that OpenURL and SRW are using different info URIs to describe the same things: info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix The latter technically isn't the same thing since the OpenURL one claims it's an identifier for ONIX 2.1, but if I wasn't sending this email now, eventually SRU would have registered info:srw/schema/1/onix-v2.1 There are several other examples, as well (MODS, ISO20775, etc.) and it's not a stretch to envision more in the future. So there are a couple of questions here. First, and most importantly, how do we reconcile these different identifiers for the same thing? Can we come up with some agreement on which ones we should really use? Secondly, and this gets to the reason why any of this was brought up in the first place, how can we coordinate these identifiers more effectively and efficiently to reuse among various specs and protocols, but not: 1) be tied to a particular community 2) require some laborious and lengthy submission and review process to just say hey, here's my FOAF available via UnAPI 3) be so lax that it throws all hope of authority out the window ? I would expect the various communities to still maintain their own registries of approved data formats (well, OpenURL and SRU, anyway -- it's not as appropriate to UnAPI or Jangle). Does something like this interest any of you? Is there value in such an initiative? Thanks, -Ross.
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Thanks, Ross. For SRU, this is an opportune time to reconcile these differences. Opportune, because we are approaching standardization of SRU/CQL within OASIS, and there will be a number of areas that need to change. Some observations. 1. the 'ofi' namespace of 'info' has the advantage that the name, ofi, isn't necessarily tied to a community or application (I suppose one could claim that the acronym ofi means openURL something starting with 'f' for Identifiers but it doesn't say so anywhere that I can find.) However, the namespace itself (if not the name) is tied to OpenURL. Namespace of Registry Identifiers used by the NISO OpenURL Framework Registry. That seems like a simple problem to fix. (Changing that title would not cause any technical problems. ) 2. In contrast, with the srw namespace, the actual name is srw. So at least in name, it is tied to an application. 3. On the other side, the srw namespace has the distinct advantage of built-in extensibility. For the URI: info:srw/schema/1/onix-v2.0, the 1 is an authority. There are (currently) 15 such authorities, they are listed in the (second) table at http://www.loc.gov/standards/sru/resources/infoURI.html Authority 1 is the SRU maintenance agency, and the objects registered under that authority are, more-or-less, public. But objects can be defined under the other authorities with no registration process required. 4. ofi does not offer this sort of extensibility. So, if we were going to unify these two systems (and I can't speak for the SRU community and commit to doing so yet) the extensibility offered by the srw approach would be an absolute requirement. If it could somehow be built in to ofi, then I would not be opposed to migrating the srw identifiers. Another approach would be to register an entirely new 'info:' URI namespace and migrating all of these identifiers to the new namespace. --Ray - Original Message - From: Ross Singer rossfsin...@gmail.com To: z...@listserv.loc.gov Sent: Thursday, April 30, 2009 2:59 PM Subject: One Data Format Identifier (and Registry) to Rule Them All Hello everybody. I apologize for the crossposting, but this is an area that could (potentially) affect every one of these groups. I realize that not everybody will be able to respond to all lists, but... First of all, some back story (Code4Lib subscribers can probably skip ahead): Jangle [1] requires URIs to explicitly declare the format of the data it is transporting (binary marc, marcxml, vcard, DLF simpleAvailability, MODS, EAD, etc.). In the past, it has used it's own URI structure for this (http://jangle.org/vocab/formats#...) but this was always been with the intention of moving out of the jangle.org into a more generic space so it could be used by other initiatives. This same concept came up in UnAPI [2] (I think this thread: http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-March/thread.html#682 discusses it a bit - there is a reference there that it maybe had come up before) although was rejected ultimately in favor of an (optional) approach more in line with how OAI-PMH disambiguates metadata formats. That being said, this page used to try to set sort of convention around the UnAPI formats: http://unapi.stikipad.com/unapi/show/existing+formats But it's now just a squatter page. Jakob Voss pointed out that SRU has a schema registry and that it would make sense to coordinate with this rather than mint new URIs for things that have already been defined there: http://www.loc.gov/standards/sru/resources/schemas.html This, of course, made a lot of sense. It also made me realize that OpenURL *also* has a registry of metadata formats: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats The problem here is that OpenURL and SRW are using different info URIs to describe the same things: info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix The latter technically isn't the same thing since the OpenURL one claims it's an identifier for ONIX 2.1, but if I wasn't sending this email now, eventually SRU would have registered info:srw/schema/1/onix-v2.1 There are several other examples, as well (MODS, ISO20775, etc.) and it's not a stretch to envision more in the future. So there are a couple of questions here. First, and most importantly, how do we reconcile these different identifiers for the same thing? Can we come up with some agreement on which ones we should really use? Secondly, and this gets to the reason why any of this was brought up in the first place, how can we coordinate these identifiers more effectively and efficiently to reuse among various specs and protocols, but not: 1) be tied to a particular community 2) require some laborious and lengthy submission and review process to just say hey, here's my FOAF available via UnAPI 3) be so
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Some further observations. So far this threadling has mentioned only trying to unify two different sets of identifiers. However there are a much larger number of them out there (and even larger numbers of schemas and other standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about) and the problem exists for any of these things (identifiers, etc.) where there are more than one of them. So really unifying two sets of identifiers, while very useful, is not actually going to solve much. Is there any broader methodology we could approach which potentially allows multiple unifications or (my favourite) cross-walks. (Complete unification requires everybody agrees and sticks to it, and human history is sort of not on that track...) And who (people and organizations) would undertake this? Ross' point about a lightweight approach is necessary for any sort of adoption, but this is a problem (which plagues all we do in federated search) which cannot just be solved by another registry. Somebody/organisation has to look at the identifiers or whatever and decide that two of them are identical or, worse, only partially overlap and hence scope has to be defined. In a syntax that all understand of course. Already in this thread we have the sub/super case question from Karen (in a post on the openurl (or Z39.88 sigh - identifiers!) listserv). And the various identifiers for MARC (below) could easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of one (computer understandable) syllable what the differences are. I'm not trying to make problems. There are problems and this is only a small subset of them, and they confound us every day. I would love to adopt standard definitions for these things, but which Standard? Because anyone can produce any identifier they like, we have decided that the unification of them has to be kept internal where we at least have control of the unifications, even if they change pretty frequently. Peter Dr Peter Noerr CTO, MuseGlobal, Inc. +1 415 896 6873 (office) +1 415 793 6547 (mobile) www.museglobal.com -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Thursday, April 30, 2009 12:00 To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Hello everybody. I apologize for the crossposting, but this is an area that could (potentially) affect every one of these groups. I realize that not everybody will be able to respond to all lists, but... First of all, some back story (Code4Lib subscribers can probably skip ahead): Jangle [1] requires URIs to explicitly declare the format of the data it is transporting (binary marc, marcxml, vcard, DLF simpleAvailability, MODS, EAD, etc.). In the past, it has used it's own URI structure for this (http://jangle.org/vocab/formats#...) but this was always been with the intention of moving out of the jangle.org into a more generic space so it could be used by other initiatives. This same concept came up in UnAPI [2] (I think this thread: http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006- March/thread.html#682 discusses it a bit - there is a reference there that it maybe had come up before) although was rejected ultimately in favor of an (optional) approach more in line with how OAI-PMH disambiguates metadata formats. That being said, this page used to try to set sort of convention around the UnAPI formats: http://unapi.stikipad.com/unapi/show/existing+formats But it's now just a squatter page. Jakob Voss pointed out that SRU has a schema registry and that it would make sense to coordinate with this rather than mint new URIs for things that have already been defined there: http://www.loc.gov/standards/sru/resources/schemas.html This, of course, made a lot of sense. It also made me realize that OpenURL *also* has a registry of metadata formats: http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataP refix=oai_dcset=Core:Metadata+Formats The problem here is that OpenURL and SRW are using different info URIs to describe the same things: info:srw/schema/1/marcxml-v1.1 info:ofi/fmt:xml:xsd:MARC21 or info:srw/schema/1/onix-v2.0 info:ofi/fmt:xml:xsd:onix The latter technically isn't the same thing since the OpenURL one claims it's an identifier for ONIX 2.1, but if I wasn't sending this email now, eventually SRU would have registered info:srw/schema/1/onix-v2.1 There are several other examples, as well (MODS, ISO20775, etc.) and it's not a stretch to envision more in the future. So there are a couple of questions here. First, and most importantly, how do we reconcile these different identifiers for the same thing? Can we come up with some agreement on which ones we should really use? Secondly, and this gets to the reason why any of this was brought up in
Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All
Crosswalk is exactly the wrong answer for this. Two very small overlapping communities of most library developers can surely agree on using the same identifiers, and then we make things easier for US. We don't need to solve the entire universe of problems. Solve the simple problem in front of you in the simplest way that could possibly work and still leave room for future expansion and improvement. From that, we learn how to solve the big problems, when we're ready. Overreach and try to solve the huge problem including every possible use case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with the kind of over-abstracted over-engineered too-complicated-to-actually-catch-on solutions that... we in the library community normally end up with. From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Peter Noerr [pno...@museglobal.com] Sent: Thursday, April 30, 2009 6:37 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Some further observations. So far this threadling has mentioned only trying to unify two different sets of identifiers. However there are a much larger number of them out there (and even larger numbers of schemas and other standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about) and the problem exists for any of these things (identifiers, etc.) where there are more than one of them. So really unifying two sets of identifiers, while very useful, is not actually going to solve much. Is there any broader methodology we could approach which potentially allows multiple unifications or (my favourite) cross-walks. (Complete unification requires everybody agrees and sticks to it, and human history is sort of not on that track...) And who (people and organizations) would undertake this? Ross' point about a lightweight approach is necessary for any sort of adoption, but this is a problem (which plagues all we do in federated search) which cannot just be solved by another registry. Somebody/organisation has to look at the identifiers or whatever and decide that two of them are identical or, worse, only partially overlap and hence scope has to be defined. In a syntax that all understand of course. Already in this thread we have the sub/super case question from Karen (in a post on the openurl (or Z39.88 sigh - identifiers!) listserv). And the various identifiers for MARC (below) could easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of one (computer understandable) syllable what the differences are. I'm not trying to make problems. There are problems and this is only a small subset of them, and they confound us every day. I would love to adopt standard definitions for these things, but which Standard? Because anyone can produce any identifier they like, we have decided that the unification of them has to be kept internal where we at least have control of the unifications, even if they change pretty frequently. Peter Dr Peter Noerr CTO, MuseGlobal, Inc. +1 415 896 6873 (office) +1 415 793 6547 (mobile) www.museglobal.com -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Ross Singer Sent: Thursday, April 30, 2009 12:00 To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All Hello everybody. I apologize for the crossposting, but this is an area that could (potentially) affect every one of these groups. I realize that not everybody will be able to respond to all lists, but... First of all, some back story (Code4Lib subscribers can probably skip ahead): Jangle [1] requires URIs to explicitly declare the format of the data it is transporting (binary marc, marcxml, vcard, DLF simpleAvailability, MODS, EAD, etc.). In the past, it has used it's own URI structure for this (http://jangle.org/vocab/formats#...) but this was always been with the intention of moving out of the jangle.org into a more generic space so it could be used by other initiatives. This same concept came up in UnAPI [2] (I think this thread: http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006- March/thread.html#682 discusses it a bit - there is a reference there that it maybe had come up before) although was rejected ultimately in favor of an (optional) approach more in line with how OAI-PMH disambiguates metadata formats. That being said, this page used to try to set sort of convention around the UnAPI formats: http://unapi.stikipad.com/unapi/show/existing+formats But it's now just a squatter page. Jakob Voss pointed out that SRU has a schema registry and that it would make sense to coordinate with this rather than mint new URIs for things that have already been defined there: http://www.loc.gov/standards/sru/resources/schemas.html This, of course, made a lot of
[CODE4LIB] job posting: interface programmer, University of Michigan
Please forward to anyone who may be interested in this position. This position is also listed in U-M jobs site ( http://www.umich.edu/~jobs/ ) under posting #30698. ### Scholarly Publishing Office University of Michigan University Library Interface Programmer The Scholarly Publishing Office (SPO) of the University of Michigan University Library is seeking an Interface Programmer for a full-time two-year term appointment with the possibility of renewal. The SPO Interface Programmer works in a team environment for online publishing of scholarly literature and is primarily responsible for implementing interfaces for a broad variety of scholarly publications, making contributions to interface design and usability testing, using open standards and open-source software. The Interface Programmer will also contribute to the design and development of online publishing tools, including content management systems. Work will primarily focus on coding publication-specific interface customizations for SPO's locally developed publishing platform, DLXS (see http://www.dlxs.org). Other projects may include: system-wide improvements to the DLXS interface; implementation of content management system for digitalculturebooks (an online book series); interface support for collaborative publishing projects with the University of Michigan Press; interface specification for a database of 20 disciplinarily related journals; assessment and implementation of strategies to increase the discoverability of SPO publications; participation in a review of electronic publishing platforms. DUTIES: * Implement interface customizations for DLXS publications, fulfilling requests by the editors (40%) * Maintain and develop CMS-based publishing tools and services (35%) * Evaluate, recommend, and implement DLXS-wide interface improvements (15%) * Assist in research and evaluation of publishing software solutions (5%) * Document interface features for end users and other digital library developers (5%) QUALIFICATIONS: REQUIRED: Bachelor's degree plus three years of relevant experience as a programmer, plus one year of experience with Web or other interface design. Thorough knowledge of XHTML, XML, CSS, and at least one high-level programming language (Perl or PHP preferred), and experience with UNIX-like OS. Excellent written and oral skills. Ability to work in a team environment. Attention to detail. DESIRED: Experience with XSLT. Experience in graphic design for electronic media. Knowledge of and experience with digital libraries or electronic publishing. Proficiency with content management system development and design (particularly Drupal). Experience with relational databases and SQL. Experience using online scholarly resources. Experience with theory and practice of usability testing and information architecture. ALA-accredited masters degree in library or information studies or equivalent advanced degree and experience. ABOUT SPO: The services of SPO are part of the Library's service to the University of Michigan, and are developed in keeping with the Library's concern about issues of intellectual property, long-term retention and archiving of content, and its support of scholarship in all forms. SPO is currently responsible for the online publication of a range of scholarly literature, including significant text-based journal collections, scholarly monographs, conference proceedings, scholarly bibliographies, and image collections. SPO is a highly collaborative work environment in which staff engage in both the daily work of publishing and in broader discussions of scholarly communication and the transformational potential of digital communication technology. All staff are involved in direction-setting, in articulating a vision of the library as scholarly publisher, and in putting that vision into practice. We have an inclination toward open source software and balanced, pragmatic approaches to problems. We work to embrace emerging technology standards and best practices of the digital library community. Rank is anticipated at the level of Programmer Analyst Intermediate, or Assistant or Associate Librarian. Positions receive 24 days of vacation a year; 15 days of sick leave a year with provisions for extended benefits, as well as opportunities for travel and professional development. Retirement: TIAA-CREF or Fidelity Investments To apply, please send cover letter and copy of resume to: Library Human Resources 404 Hatcher Library North University of Michigan Ann Arbor, MI 48109-1205 Contact libhum...@umich.edu or 734-764-2546 for further information