[CODE4LIB] OR09 Workshop / Tools for Repositories - Microsoft Research the Scholarly Information Ecosystem

2009-04-30 Thread Eric Lease Morgan

[Forwarded upon request --ELM]

(apologies for cross-postings)

Colleagues, Microsoft Research will be hosting a workshop at the Open  
Repositories '09 meeting in Atlanta, GA on from 1-6pm on Thursday, May  
21st (https://or09.library.gatech.edu/workshops.php). A preliminary  
description below, with a more detailed agenda to be made available  
shortly.


Registration is open now (email us at scho...@microsoft.commailto:scho...@microsoft.com 
); we are able to accommodate up to 50 attendees. We hope you will  
join us!


Tools for Repositories - Microsoft Research  the Scholarly  
Information Ecosystem


Microsoft External Research strongly supports the process of research  
and its role in the innovation ecosystem, including developing and  
supporting efforts in open access, open tools, open technology, and  
interoperability.  We partner with universities, national libraries,  
publishers, and governmental organizations to help develop tools and  
services to evolve the scholarly information lifecycle.  These  
projects demonstrate our ongoing work towards producing next- 
generation documents that increase productivity and empower authors to  
increase the discoverability and appropriate re-use of their work.   
This workshop will provide a deep dive into several freely available  
and open source tools from Microsoft External Research, and will  
demonstrate how these can help supplement and enhance current  
repository offerings.


Come learn more about how the Microsoft Research tools can help extend  
the reach and utility of your repository efforts.  Each session during  
the half-day workshop will include a hands-on component so that  
attendees can gain a deeper technical understanding of the available  
tool-set, which includes the following resources:


  * Article Authoring Add-in for Word 2007
 o Structured document authoring (based on
   the NLM-DTD)
 o Ontology integration and markup
 o Repository search integration
 o ORE Resource Map authoring
 o Article repository submission workflow
   (via REST and SWORD interfaces)
  * Microsoft eJournal Service - a hosted
peer-review workflow system
  * Zentity - Our v1.0 research-output
repository platform
  * Research Information Centre - a
collaboration space for researchers
  * Windows Live Machine Translation Service
  * Document Conversion Service

More information on each of these tools can be found at: 
http://www.microsoft.com/scholarlycomm

To register, please email us at scho...@microsoft.commailto:scho...@microsoft.com 



We look forward to having you join us!

--
Lee Dirks
Director, Education  Scholarly Communication
Microsoft Corporation - External Research
ldi...@microsoft.com mailto:ldi...@microsoft.com
(425) 703-6866
http://microsoft.com/scholarlycomm


Re: [CODE4LIB] registering info: uris?

2009-04-30 Thread Ross Singer
So hey, I'm nobody wanted to see this thread revived, but I'm hoping
you info uri folks can clear something up for me.

So I'm trying to gather together a vocabulary of identifiers to
unambiguously describe the format of the data you would be getting in
a Jangle feed or an UnAPI response (or any other variation on this
theme).  I have a MODS document and I want *you* to have it too!.

Jakob Voss made the (reasonable) suggestion that rather than create
yet another identifier or registry to describe these formats, instead
it would make sense to use the work that the SRU:

http://www.loc.gov/standards/sru/resources/schemas.html

or OpenURL:

http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats

communities have already done.  Which makes a lot of sense.  It would
be nice to use the same identifier in Jangle, SRU and OpenURL to say
that this is a MARCXML or ONIX record.

Except that OpenURL and SRU /already use different info URIs to
describe the same things/.

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

or

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

What is the rationale for this?  How do we keep up?  Are they
reusable?  Which one should be used?  Doesn't this pretty horribly
undermine the purpose of using info URIs in the first place?

Is anybody else interested in working on a way to unambiguously say
here is a Dublin Core resource as XML, but it is not OAI DC or this
is text/x-vcard, it conforms to vCard 3.0 in a way that we can reuse
among all of our various ways of sharing data?

Thanks,
-Ross.


Re: [CODE4LIB] Recommend book scanner?

2009-04-30 Thread Erik Hetzner
At Wed, 29 Apr 2009 13:32:08 -0400,
Christine Schwartz wrote:
 
 We are looking into buying a book scanner which we'll probably use for
 archival papers as well--probably something in the $1,000.00 range.
 
 Any advice?

Most organizations, or at least the big ones, Internet Archive and
Google, seem to be using a design based on 2 fixed cameras rather than
a tradition scanner type device. Is this what you had in mind?

Unfortunately none of these products are cheap. Internet Archive’s
Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because
it has two very expensive cameras. Google’s data is unavailable. A
company called Kirtas also sells what look like very expensive
machines of a similar design.

On the other hand, there are projects like bkrpr [2] and [3],
home-brew scanning stations build for marginally more than the cost of
a pair of $100 cameras. I think that these are a real possibility for
smaller organizations. The maturity of the software and workflow is
problematic, but with Google’s Ocropus OCR software [4] freely
available as the heart of a scanning workflow, the possibility is
there. Both bkrpr and [3] have software currently available, although
in the case of bkrpr at least the software is in the very early stages
of development.

best,
Erik Hetzner

1. 
http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/
2. http://bkrpr.org/doku.php
3. 
http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/
4. http://code.google.com/p/ocropus/
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpYI2WLVtxUI.pgp
Description: PGP signature


Re: [CODE4LIB] Recommend book scanner?

2009-04-30 Thread Ethan Gruber
How good are the two-camera apparatuses for scanning things other than
books?  The thing about the Google and Kirtas scanners is that they are not
particularly recommended for dealing with fragile books or otherwise special
collections materials.  The University of Virginia Library is still using
the single camera method for books as well as manuscripts, photographs,
slides, coins, etc.  It's a Hasselblad camera with a 45 megapixel Phase One
digital back, but significantly out of the $1000 range.  I haven't dealt
with all the camera hardware you could find, but I have never seen a
professional digitization hardware/software suite for as low as a thousand
bucks.  You could check out i2s's copibook series (
http://www.i2s-bookscanner.com/produits.asp?gamme=1003sX_Menu_selectedID=leftV_1003_MOD),
but I have no idea how much they cost; they don't say.  Erik's idea of
building something custom is an option, but you might not necessarily get
consistent quality and production rate.

Have you considered partnering with Princeton University's digitization
labs?  The UVA Health Sciences Library occasionally borrows/trades/buys
resources from the university library's digitization services (the health
system and university are technically two different entities).  For all I
know, someone from Princeton University is on this list; I don't know what
their resources are and don't presume to speak for them.  That's just my
idea.

Ethan Gruber

On Thu, Apr 30, 2009 at 12:49 PM, Erik Hetzner erik.hetz...@ucop.eduwrote:

 At Wed, 29 Apr 2009 13:32:08 -0400,
 Christine Schwartz wrote:
 
  We are looking into buying a book scanner which we'll probably use for
  archival papers as well--probably something in the $1,000.00 range.
 
  Any advice?

 Most organizations, or at least the big ones, Internet Archive and
 Google, seem to be using a design based on 2 fixed cameras rather than
 a tradition scanner type device. Is this what you had in mind?

 Unfortunately none of these products are cheap. Internet Archive’s
 Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because
 it has two very expensive cameras. Google’s data is unavailable. A
 company called Kirtas also sells what look like very expensive
 machines of a similar design.

 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras. I think that these are a real possibility for
 smaller organizations. The maturity of the software and workflow is
 problematic, but with Google’s Ocropus OCR software [4] freely
 available as the heart of a scanning workflow, the possibility is
 there. Both bkrpr and [3] have software currently available, although
 in the case of bkrpr at least the software is in the very early stages
 of development.

 best,
 Erik Hetzner

 1. 
 http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/
 
 2. http://bkrpr.org/doku.php
 3. 
 http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/
 
 4. http://code.google.com/p/ocropus/

 ;; Erik Hetzner, California Digital Library
 ;; gnupg key id: 1024D/01DB07E3




Re: [CODE4LIB] Recommend book scanner?

2009-04-30 Thread William Wueppelmann

Erik Hetzner wrote:

At Wed, 29 Apr 2009 13:32:08 -0400,
Christine Schwartz wrote:

We are looking into buying a book scanner which we'll probably use for
archival papers as well--probably something in the $1,000.00 range.

Any advice?


Most organizations, or at least the big ones, Internet Archive and
Google, seem to be using a design based on 2 fixed cameras rather than
a tradition scanner type device. Is this what you had in mind?


This is probably the type of machine that will be needed for books if 
they need to remain bound throughout the scanning process. For looseleaf 
materials or for books that can be disbound and are in good condition, 
you can get inexpensive duplex sheet feeder scanners for a few hundred 
dollars that might be good enough.



Unfortunately none of these products are cheap. Internet Archive’s
Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because
it has two very expensive cameras. Google’s data is unavailable. A
company called Kirtas also sells what look like very expensive
machines of a similar design.


$15K seems pretty cheap for that kind of scanner; most that I've seen 
run from the tens of thousands well into the hundreds, depending on the 
model and features. I don't remember precisely what IA's Scribe stations 
cost, but I think they were more in the range of $40-60K CAD; it would 
probably be cheaper in the US, but not that much cheaper, and I suspect 
that IA gets some sort of bulk discount for buying them by the truckload.


The main issues to consider are:

- Type of material: is it fragile or not; is it rare; can you afford to 
damage or destroy a copy during the scanning process; can the items be 
disbound; what is the minimum and maximum size of item to be scanned; if 
books are to remain bound, are the bindings tight or are the margins; 
paper thickness; existence of damage, water spotting, show through, and 
other defects


- Scanning resolution required

- Image output (color/greyscale/black and white) and output format 
(TIFF, JPEG2000, PDF, JPEG).


- Throughput requirement. (How much stuff do you have: 
dozens/hundreds/thousands/millions of pages, and how quickly do you need 
to get it done: days/weeks/months/years?)


- How much technical work can/are you willing to do yourself? Can you 
invest in substantial post-processing, or do you need to be able to 
press Go on the scanner and produce a more or less finished product? 
If so, what sort of metadata, OCR, etc. requirements do you have, if 
any, in addition to getting the basic image?


For some projects, there are suitable desktop scanners available for 
very little money, and in some cases, using a decent (7 megapixel or 
higher) digital camera in conjunction with a stand and maybe an image 
editor like Photoshop (or something free like Irfanview) to crop and 
deskew afterwards might work just fine, but in other cases, a much more 
elaborate setup might be needed.


--
William Wueppelmann
Systems Librarian/Programmer
Canadiana.org
http://www.canadiana.org


Re: [CODE4LIB] registering info: uris?

2009-04-30 Thread Ray Denenberg, Library of Congress

From: Ross Singer rossfsin...@gmail.com

Except that OpenURL and SRU /already use different info URIs to
describe the same things/.

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

or

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

What is the rationale for this?


None.  (Or, whatever rationale there was, historically, should no longer 
apply.)  These should be aligned.   Post this to the OpenURL list (and 
perhaps SRU as well).  I'm certainly willing to work to come up with a 
solution.


--Ray


[CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Ross Singer
Hello everybody.  I apologize for the crossposting, but this is an
area that could (potentially) affect every one of these groups.  I
realize that not everybody will be able to respond to all lists,
but...

First of all, some back story (Code4Lib subscribers can probably skip ahead):

Jangle [1] requires URIs to explicitly declare the format of the data
it is transporting (binary marc, marcxml, vcard, DLF
simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
own URI structure for this (http://jangle.org/vocab/formats#...) but
this was always been with the intention of moving out of the
jangle.org into a more generic space so it could be used by other
initiatives.

This same concept came up in UnAPI [2] (I think this thread:
http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-March/thread.html#682
discusses it a bit - there is a reference there that it maybe had come
up before) although was rejected ultimately in favor of an (optional)
approach more in line with how OAI-PMH disambiguates metadata formats.
 That being said, this page used to try to set sort of convention
around the UnAPI formats:
http://unapi.stikipad.com/unapi/show/existing+formats
But it's now just a squatter page.

Jakob Voss pointed out that SRU has a schema registry and that it
would make sense to coordinate with this rather than mint new URIs for
things that have already been defined there:
http://www.loc.gov/standards/sru/resources/schemas.html

This, of course, made a lot of sense.  It also made me realize that
OpenURL *also* has a registry of metadata formats:
http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats

The problem here is that OpenURL and SRW are using different info URIs
to describe the same things:

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

or

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

The latter technically isn't the same thing since the OpenURL one
claims it's an identifier for ONIX 2.1, but if I wasn't sending this
email now, eventually SRU would have registered
info:srw/schema/1/onix-v2.1

There are several other examples, as well (MODS, ISO20775, etc.) and
it's not a stretch to envision more in the future.

So there are a couple of questions here.

First, and most importantly, how do we reconcile these different
identifiers for the same thing?  Can we come up with some agreement on
which ones we should really use?

Secondly, and this gets to the reason why any of this was brought up
in the first place, how can we coordinate these identifiers more
effectively and efficiently to reuse among various specs and
protocols, but not:
1) be tied to a particular community
2) require some laborious and lengthy submission and review process to
just say hey, here's my FOAF available via UnAPI
3) be so lax that it throws all hope of authority out the window
?

I would expect the various communities to still maintain their own
registries of approved data formats (well, OpenURL and SRU, anyway
-- it's not as appropriate to UnAPI or Jangle).

Does something like this interest any of you?  Is there value in such
an initiative?

Thanks,
-Ross.


Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Ray Denenberg, Library of Congress
Thanks, Ross. For SRU, this is an opportune time to reconcile these 
differences.  Opportune, because we are approaching standardization of 
SRU/CQL within OASIS, and there will be a number of areas that need to 
change.


Some observations.

1. the 'ofi' namespace of 'info' has the advantage that the name, ofi, 
isn't necessarily tied to a community or application (I suppose one could 
claim that  the acronym ofi means openURL something starting with 'f' 
for Identifiers  but it doesn't say so anywhere that I can find.)  However, 
the namespace itself (if not the name) is tied to OpenURL.  Namespace of 
Registry Identifiers used by the NISO OpenURL Framework Registry.  That 
seems like a simple problem to fix.  (Changing  that title would not cause 
any technical problems. )


2. In contrast,  with the srw namespace,  the actual name is srw. So at 
least in name, it is tied to an application.


3. On the other side, the srw namespace has the distinct advantage of 
built-in extensibility.  For the URI: info:srw/schema/1/onix-v2.0,  the 1 
is an authority.   There are (currently) 15 such authorities, they are 
listed in the (second) table at 
http://www.loc.gov/standards/sru/resources/infoURI.html


Authority 1  is the SRU maintenance agency, and the objects registered 
under that authority are, more-or-less, public. But objects can be defined 
under the other authorities with no registration process required.


4.  ofi does not offer this sort of extensibility.


So, if we were going to unify these two systems (and I can't speak for the 
SRU community and commit to doing so yet) the extensibility offered by the 
srw approach would be an absolute requirement.   If it could somehow be 
built in to ofi,  then I would not be opposed to migrating the srw 
identifiers.   Another approach would be to register  an entirely  new 
'info:' URI namespace and migrating all of these identifiers to the new 
namespace.


--Ray


- Original Message - 
From: Ross Singer rossfsin...@gmail.com

To: z...@listserv.loc.gov
Sent: Thursday, April 30, 2009 2:59 PM
Subject: One Data Format Identifier (and Registry) to Rule Them All



Hello everybody.  I apologize for the crossposting, but this is an
area that could (potentially) affect every one of these groups.  I
realize that not everybody will be able to respond to all lists,
but...

First of all, some back story (Code4Lib subscribers can probably skip 
ahead):


Jangle [1] requires URIs to explicitly declare the format of the data
it is transporting (binary marc, marcxml, vcard, DLF
simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
own URI structure for this (http://jangle.org/vocab/formats#...) but
this was always been with the intention of moving out of the
jangle.org into a more generic space so it could be used by other
initiatives.

This same concept came up in UnAPI [2] (I think this thread:
http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-March/thread.html#682
discusses it a bit - there is a reference there that it maybe had come
up before) although was rejected ultimately in favor of an (optional)
approach more in line with how OAI-PMH disambiguates metadata formats.
That being said, this page used to try to set sort of convention
around the UnAPI formats:
http://unapi.stikipad.com/unapi/show/existing+formats
But it's now just a squatter page.

Jakob Voss pointed out that SRU has a schema registry and that it
would make sense to coordinate with this rather than mint new URIs for
things that have already been defined there:
http://www.loc.gov/standards/sru/resources/schemas.html

This, of course, made a lot of sense.  It also made me realize that
OpenURL *also* has a registry of metadata formats:
http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataPrefix=oai_dcset=Core:Metadata+Formats

The problem here is that OpenURL and SRW are using different info URIs
to describe the same things:

info:srw/schema/1/marcxml-v1.1

info:ofi/fmt:xml:xsd:MARC21

or

info:srw/schema/1/onix-v2.0

info:ofi/fmt:xml:xsd:onix

The latter technically isn't the same thing since the OpenURL one
claims it's an identifier for ONIX 2.1, but if I wasn't sending this
email now, eventually SRU would have registered
info:srw/schema/1/onix-v2.1

There are several other examples, as well (MODS, ISO20775, etc.) and
it's not a stretch to envision more in the future.

So there are a couple of questions here.

First, and most importantly, how do we reconcile these different
identifiers for the same thing?  Can we come up with some agreement on
which ones we should really use?

Secondly, and this gets to the reason why any of this was brought up
in the first place, how can we coordinate these identifiers more
effectively and efficiently to reuse among various specs and
protocols, but not:
1) be tied to a particular community
2) require some laborious and lengthy submission and review process to
just say hey, here's my FOAF available via UnAPI
3) be so 

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Peter Noerr
Some further observations. So far this threadling has mentioned only trying to 
unify two different sets of identifiers. However there are a much larger number 
of them out there (and even larger numbers of schemas and other 
standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about)
 and the problem exists for any of these things (identifiers, etc.) where there 
are more than one of them. So really unifying two sets of identifiers, while 
very useful, is not actually going to solve much.

Is there any broader methodology we could approach which potentially allows 
multiple unifications or (my favourite) cross-walks. (Complete unification 
requires everybody agrees and sticks to it, and human history is sort of not on 
that track...) And who (people and organizations) would undertake this?

Ross' point about a lightweight approach is necessary for any sort of adoption, 
but this is a problem (which plagues all we do in federated search) which 
cannot just be solved by another registry. Somebody/organisation has to look at 
the identifiers or whatever and decide that two of them are identical or, 
worse, only partially overlap and hence scope has to be defined. In a syntax 
that all understand of course. Already in this thread we have the sub/super 
case question from Karen (in a post on the openurl (or Z39.88 sigh - 
identifiers!) listserv). And the various identifiers for MARC (below) could 
easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of 
one (computer understandable) syllable what the differences are. 

I'm not trying to make problems. There are problems and this is only a small 
subset of them, and they confound us every day. I would love to adopt standard 
definitions for these things, but which Standard? Because anyone can produce 
any identifier they like, we have decided that the unification of them has to 
be kept internal where we at least have control of the unifications, even if 
they change pretty frequently.

Peter


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ross Singer
 Sent: Thursday, April 30, 2009 12:00
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them
 All
 
 Hello everybody.  I apologize for the crossposting, but this is an
 area that could (potentially) affect every one of these groups.  I
 realize that not everybody will be able to respond to all lists,
 but...
 
 First of all, some back story (Code4Lib subscribers can probably skip
 ahead):
 
 Jangle [1] requires URIs to explicitly declare the format of the data
 it is transporting (binary marc, marcxml, vcard, DLF
 simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
 own URI structure for this (http://jangle.org/vocab/formats#...) but
 this was always been with the intention of moving out of the
 jangle.org into a more generic space so it could be used by other
 initiatives.
 
 This same concept came up in UnAPI [2] (I think this thread:
 http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-
 March/thread.html#682
 discusses it a bit - there is a reference there that it maybe had come
 up before) although was rejected ultimately in favor of an (optional)
 approach more in line with how OAI-PMH disambiguates metadata formats.
  That being said, this page used to try to set sort of convention
 around the UnAPI formats:
 http://unapi.stikipad.com/unapi/show/existing+formats
 But it's now just a squatter page.
 
 Jakob Voss pointed out that SRU has a schema registry and that it
 would make sense to coordinate with this rather than mint new URIs for
 things that have already been defined there:
 http://www.loc.gov/standards/sru/resources/schemas.html
 
 This, of course, made a lot of sense.  It also made me realize that
 OpenURL *also* has a registry of metadata formats:
 http://alcme.oclc.org/openurl/servlet/OAIHandler?verb=ListRecordsmetadataP
 refix=oai_dcset=Core:Metadata+Formats
 
 The problem here is that OpenURL and SRW are using different info URIs
 to describe the same things:
 
 info:srw/schema/1/marcxml-v1.1
 
 info:ofi/fmt:xml:xsd:MARC21
 
 or
 
 info:srw/schema/1/onix-v2.0
 
 info:ofi/fmt:xml:xsd:onix
 
 The latter technically isn't the same thing since the OpenURL one
 claims it's an identifier for ONIX 2.1, but if I wasn't sending this
 email now, eventually SRU would have registered
 info:srw/schema/1/onix-v2.1
 
 There are several other examples, as well (MODS, ISO20775, etc.) and
 it's not a stretch to envision more in the future.
 
 So there are a couple of questions here.
 
 First, and most importantly, how do we reconcile these different
 identifiers for the same thing?  Can we come up with some agreement on
 which ones we should really use?
 
 Secondly, and this gets to the reason why any of this was brought up
 in 

Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them All

2009-04-30 Thread Jonathan Rochkind
Crosswalk is exactly the wrong answer for this. Two very small overlapping 
communities of most library developers can surely agree on using the same 
identifiers, and then we make things easier for US.  We don't need to solve the 
entire universe of problems. Solve the simple problem in front of you in the 
simplest way that could possibly work and still leave room for future expansion 
and improvement. From that, we learn how to solve the big problems, when we're 
ready. Overreach and try to solve the huge problem including every possible use 
case, many of which don't apply to you but SOMEDAY MIGHT... and you end up with 
the kind of over-abstracted over-engineered 
too-complicated-to-actually-catch-on solutions that... we in the library 
community normally end up with. 

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Peter Noerr 
[pno...@museglobal.com]
Sent: Thursday, April 30, 2009 6:37 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them 
All

Some further observations. So far this threadling has mentioned only trying to 
unify two different sets of identifiers. However there are a much larger number 
of them out there (and even larger numbers of schemas and other 
standard-things-that-everyone-should-use-so-we-all-know-what-we-are-talking-about)
 and the problem exists for any of these things (identifiers, etc.) where there 
are more than one of them. So really unifying two sets of identifiers, while 
very useful, is not actually going to solve much.

Is there any broader methodology we could approach which potentially allows 
multiple unifications or (my favourite) cross-walks. (Complete unification 
requires everybody agrees and sticks to it, and human history is sort of not on 
that track...) And who (people and organizations) would undertake this?

Ross' point about a lightweight approach is necessary for any sort of adoption, 
but this is a problem (which plagues all we do in federated search) which 
cannot just be solved by another registry. Somebody/organisation has to look at 
the identifiers or whatever and decide that two of them are identical or, 
worse, only partially overlap and hence scope has to be defined. In a syntax 
that all understand of course. Already in this thread we have the sub/super 
case question from Karen (in a post on the openurl (or Z39.88 sigh - 
identifiers!) listserv). And the various identifiers for MARC (below) could 
easily be for MARC-XML, MARC21-ISO2709, MARCUK-ISO2709. Now explain in words of 
one (computer understandable) syllable what the differences are.

I'm not trying to make problems. There are problems and this is only a small 
subset of them, and they confound us every day. I would love to adopt standard 
definitions for these things, but which Standard? Because anyone can produce 
any identifier they like, we have decided that the unification of them has to 
be kept internal where we at least have control of the unifications, even if 
they change pretty frequently.

Peter


Dr Peter Noerr
CTO, MuseGlobal, Inc.

+1 415 896 6873 (office)
+1 415 793 6547 (mobile)
www.museglobal.com


 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Ross Singer
 Sent: Thursday, April 30, 2009 12:00
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] One Data Format Identifier (and Registry) to Rule Them
 All

 Hello everybody.  I apologize for the crossposting, but this is an
 area that could (potentially) affect every one of these groups.  I
 realize that not everybody will be able to respond to all lists,
 but...

 First of all, some back story (Code4Lib subscribers can probably skip
 ahead):

 Jangle [1] requires URIs to explicitly declare the format of the data
 it is transporting (binary marc, marcxml, vcard, DLF
 simpleAvailability, MODS, EAD, etc.).  In the past, it has used it's
 own URI structure for this (http://jangle.org/vocab/formats#...) but
 this was always been with the intention of moving out of the
 jangle.org into a more generic space so it could be used by other
 initiatives.

 This same concept came up in UnAPI [2] (I think this thread:
 http://old.onebiglibrary.net/yale/cipolo/gcs-pcs-list/2006-
 March/thread.html#682
 discusses it a bit - there is a reference there that it maybe had come
 up before) although was rejected ultimately in favor of an (optional)
 approach more in line with how OAI-PMH disambiguates metadata formats.
  That being said, this page used to try to set sort of convention
 around the UnAPI formats:
 http://unapi.stikipad.com/unapi/show/existing+formats
 But it's now just a squatter page.

 Jakob Voss pointed out that SRU has a schema registry and that it
 would make sense to coordinate with this rather than mint new URIs for
 things that have already been defined there:
 http://www.loc.gov/standards/sru/resources/schemas.html

 This, of course, made a lot of 

[CODE4LIB] job posting: interface programmer, University of Michigan

2009-04-30 Thread Morse, Jeremy
Please forward to anyone who may be interested in this position.  This
position is also listed in U-M jobs site ( http://www.umich.edu/~jobs/ )
under posting #30698.

###

Scholarly Publishing Office
University of Michigan University Library
Interface Programmer


The Scholarly Publishing Office (SPO) of the University of Michigan
University Library is seeking an Interface Programmer for a full-time
two-year term appointment with the possibility of renewal.

The SPO Interface Programmer works in a team environment for online
publishing of scholarly literature and is primarily responsible for
implementing interfaces for a broad variety of scholarly publications,
making contributions to interface design and usability testing, using
open standards and open-source software.  The Interface Programmer will
also contribute to the design and development of online publishing
tools, including content management systems.

Work will primarily focus on coding publication-specific interface
customizations for SPO's locally developed publishing platform, DLXS
(see http://www.dlxs.org). Other projects may include: system-wide
improvements to the DLXS interface; implementation of content management
system for digitalculturebooks (an online book series); interface
support for collaborative publishing projects with the University of
Michigan Press; interface specification for a database of 20
disciplinarily related journals; assessment and implementation of
strategies to increase the discoverability of SPO publications;
participation in a review of electronic publishing platforms.

DUTIES:
*   Implement interface customizations for DLXS publications, fulfilling
requests by the editors (40%)
*   Maintain and develop CMS-based publishing tools and services (35%)
*   Evaluate, recommend, and implement DLXS-wide interface improvements 
(15%)
*   Assist in research and evaluation of publishing software solutions (5%)
*   Document interface features for end users and other digital library
developers (5%)

QUALIFICATIONS:
REQUIRED: Bachelor's degree plus three years of relevant experience as a
programmer, plus one year of experience with Web or other interface
design. Thorough knowledge of XHTML, XML, CSS, and at least one
high-level programming language (Perl or PHP preferred), and experience
with UNIX-like OS. Excellent written and oral skills. Ability to work in
a team environment. Attention to detail.

DESIRED: Experience with XSLT. Experience in graphic design for
electronic media. Knowledge of and experience with digital libraries or
electronic publishing. Proficiency with content management system
development and design (particularly Drupal). Experience with relational
databases and SQL. Experience using online scholarly resources.
Experience with theory and practice of usability testing and information
architecture.  ALA-accredited masters degree in library or information
studies or equivalent advanced degree and experience.



ABOUT SPO:
The services of SPO are part of the Library's service to the University
of Michigan, and are developed in keeping with the Library's concern
about issues of intellectual property, long-term retention and archiving
of content, and its support of scholarship in all forms. SPO is
currently responsible for the online publication of a range of scholarly
literature, including significant text-based journal collections,
scholarly monographs, conference proceedings, scholarly bibliographies,
and image collections.

SPO is a highly collaborative work environment in which staff engage in
both the daily work of publishing and in broader discussions of
scholarly communication and the transformational potential of digital
communication technology. All staff are involved in direction-setting,
in articulating a vision of the library as scholarly publisher, and in
putting that vision into practice. We have an inclination toward open
source software and balanced, pragmatic approaches to problems. We work
to embrace emerging technology standards and best practices of the
digital library community.

Rank is anticipated at the level of Programmer Analyst Intermediate, or
Assistant or Associate Librarian. Positions receive 24 days of vacation
a year; 15 days of sick leave a year with provisions for extended
benefits, as well as opportunities for travel and professional
development. Retirement: TIAA-CREF or Fidelity Investments

To apply, please send cover letter and copy of resume to:

Library Human Resources
404 Hatcher Library North
University of Michigan
Ann Arbor, MI 48109-1205

Contact libhum...@umich.edu or 734-764-2546 for further information