[CODE4LIB] Job Term Extended! Re-post of Stanford Libraries Digitization Workflow Engineer

2010-06-16 Thread Cathy Aster
PLEASE NOTE: This position has been re-posted with the term now extended
from the original 12 to 24 months.  If you have applied previously, you
*must* reapply to be considered.  Thanks!

To apply for the position described, please go to the Stanford University
online job application system, and search for Requisition #37588:
http://jobs.stanford.edu/

*Request no direct phone calls or emails, please.*

Digitization Workflow Engineer

Fixed Term for 24 months

Overview

Stanford University Libraries and Academic Information Resources (SULAIR)
have an ongoing program to produce and archive digital reproductions of
library materials. Digital Library Systems and Services (DLSS) manages and
operates several labs dedicated to digitization of print, audio and video
materials, and is building a digital library infrastructure to preserve and
provide access to these digitized materials.

Under the supervision of the Manager of Web Application Development in DLSS,
the Digitization Workflow Engineer will be responsible for building and
implementing systems that help manage the lifecycle of digitized objects.
This lifecycle begins with the object's selection for digitization, and ends
with its publication on the World Wide Web and preservation in the Stanford
Digital Repository. Other steps include metadata creation, digitization,
quality control, file cleanup, derivative creation and file validation. The
workflow systems implemented by the Engineer will focus on digitization
processes and preparation of files for online access and preservation systems.

This is primarily an engineering position, with responsibility for building
and implementing automated and manual tools and interfaces to support the
digitization labs. The workflow engineer will work closely with the lab
managers, the QA specialist, project managers and project coordinators to
build tools and systems that support individual projects and ongoing
digitization activities. The workflow engineer will also work closely with
the DLSS architect and other DLSS software developers to use, extend and
integrate with the existing digital library infrastructure and related services.

Primary Responsibilities

- Build or integrate tools for metadata creation. This may include online
forms for manually creating and editing XML metadata descriptions, and
automated tools for extracting embedded metadata values, text conversion
(OCR) or structural and logical markup.
- Develop end-to-end workflow system for digitization labs that automates as
much as possible file naming, movement of files from step to step, logging
of errors, workflow tracking, file validation, file processing and
derivative creation. The workflow systems should prepare files for online
access and preservation systems, and will integrate with (and leverage as
much as possible) the Libraries’ digital infrastructure.
- Build an online digitization project management system to facilitate
assignment of work, flagging of exceptions, tracking of progress and
reporting of project status.
- Develop algorithms and build tools to support format-specific digitization
workflow. This may include manipulations of or enhancements to digital
texts, images, audio files, video files, map and geospatial data, or born
digital materials.

Required Knowledge and Expertise

2-3 years of professional software engineering experience is required.

- Participation in at least one application development project using Ruby
on Rails or Java. Familiarity with a range of programming and scripting
languages is essential
- Demonstrated proficiency building applications in the Ruby on Rails
development framework.
- Demonstrated proficiency in scripting simple utilities, using Ruby, Perl,
shell scripts, or Python.
- Demonstrated ability to write solid, simple, elegant code both
independently and in a team-programming environment and within schedule
limitations.
- In-depth knowledge of HTML and related website development technologies
and software (especially CSS and PhP).
- Demonstrated expertise with XML and related tools and technologies (e.g.,
XML schema, schema management and databases, XSLT, X-forms).
- Experience with relational database design and management. Experience
implementing database applications for SQL Server, Oracle, or MySQL.
- Demonstrated ability to work independently on a project from specification
to launch; communicate effectively, orally and in writing; and work with all
levels of staff, vendors, and consultants.
- Demonstrated ability to work collaboratively on a project from
specification to launch; and to work with multiple levels of staff, and
colleagues at peer institutions and in open source communities.
- Demonstrated ability to develop new programming skills quickly, and to
grasp unfamiliar architectures and application designs quickly.
- Demonstrated proficiency applying best practices to technical projects,
especially test-first development and automated testing. Also must make
effective use of team c

Re: [CODE4LIB] Get together in DC during ALA?

2010-06-16 Thread Rosalyn Metz
Joe,

I would be there and can recommend some places (being a former local
but still a local at heart).

Rosalyn



On Wed, Jun 16, 2010 at 1:02 PM, Bryan Baldus
 wrote:
> On Wednesday, June 16, 2010 11:55 AM, Joe Hourcle wrote:
>>We had pretty good turn out the last time we had a code4lib dinner during an 
>>ALA meeting in DC a few years back.
>>Are there enough code4lib people either going to ALA or local to make it 
>>worth trying to organize again?
>
> If the timing works out, I'd be interested in participating (perhaps Monday 
> evening, which I believe is when it took place last time [1])
>
> Talk to you later,
>
> Bryan Baldus
> Cataloger
> Quality Books Inc.
> The Best of America's Independent Presses
> 1-800-323-4241x402
> bryan.bal...@quality-books.com
> eij...@cpan.org
> http://home.comcast.net/~eijabb/
>
> [1] Message from last time:
>
> -Original Message-
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Monday, June 18, 2007 11:33 AM
> To: CODE4LIB@listserv.nd.edu
> Subject: [CODE4LIB] Informal get together Monday of ALA
>
> Some of us have spontaneously decided to have an informal Code4Lib get 
> together the Monday of ALA in DC.
>
> We will meet on Monday the 25th of June at 8pm, at "RFD", which reccomended 
> by anarchivist, which appears to be a pub and "Washington's Largest 
> Multi-Tap".  It's located just a couple blocks from the convention center.
>
> http://www.lovethebeer.com/rfd.html
>
> Some of the Talis crew have said they will be there. I will be there.
> Anarchivist and edsu have said they'll be there. (I forget if I just made up 
> edsu).
>
> Please join us! Any and everyone interested in meeting code4lib folks or 
> other assorted library technologists and library geeks and hangers on are 
> welcome.
>
> No, I wasn't planning on making a reservation or anything. No, I have no idea 
> how we'll all find each other. I think it'll work out.
>
> Jonathan
>


[CODE4LIB] [ANNOUNCEMENT] : June 2010 issue of ITALica, a weblog on libraries and information technology...

2010-06-16 Thread Andy Boze

Cross-posted; apologies for duplication.
*

Hello friends,

The June 2010 issue of /Information Technology and Libraries/ (ITAL), 
LITA's peer-reviewed quarterly journal, is online and accessible to all 
LITA members. Issues older than six months are open to all. ITAL's main 
page is at 
.


ITALica , the weblog discussion area for 
ITAL, has been updated with information about the latest issue. ITALica 
features supplementary materials not included with the regular print and 
electronic versions of /Information Technology and Libraries/, such as 
"letters to the editor", updates to articles, and other materials we 
can't work into the journal. One of the most important features of 
ITALica is a forum for readers' conversations with our authors, wherein 
authors host and monitor discussion for a period of time after 
publication of their articles, so that you then have a chance to 
interact with them.


ITALica offers you the opportunity to discuss with the following ITAL 
authors their papers in the latest issue:


"Usability Studies of Faceted Browsing: A Literature Review" /
Jody Condit Fagan

"Reducing Psychological Resistance to Digital Repositories" /
Brian Quinn

"Web Services and Widgets for Library Information Systems" /
Godmar Back and Annette Bailey

"TUTORIAL: On the Clouds: A New Way of Computing" /
Yan Han

"FROM OUR READERS: The New User Environment: The End of Technical 
Services?" /

Bradford Lee Eden

No membership is required to view or participate in ITALica. We hope to 
see you there!


--
Andy Boze
Web site Manager, ITAL, for the Editorial Board


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-16 Thread Robertson, Wendy C
Regarding the data in OCLC, my understanding (as a former serials cataloger) is 
that there is detailed information for at least some institutions in the 
interlibrary loan portion of the OCLC database but this is not available via 
worldcat. I know our ILL department added detailed information for commonly 
requested titles years ago. I also know we are in the process of getting our 
detailed holdings loaded into OCLC (possibly just on the ILL side, I'm not sure 
about this) and maintaining our holdings through batch updates. Many of our 
current titles use summary holdings, but not all do. I believe the summary 
holdings work much more effectively with ILL as well so our serials catalogers 
have been working for years to improve our local data. As part of our move to 
summary holdings, we also reduced some of the detail in our holdings, so now we 
show only gaps of entire volumes, but not specific missing issues in our coded 
holdings (the missing issues are included in notes in our i!
 tem specific records).

If there is better data available to ILL staff, this may be an avenue you could 
pursue.

Wendy Robertson
Digital Resources Librarian .  The University of Iowa Libraries
1015 Main Library  .  Iowa City, Iowa 52242
wendy-robert...@uiowa.edu
319-335-5821

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Bill 
Dueber
Sent: Tuesday, June 15, 2010 8:57 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

On Tue, Jun 15, 2010 at 5:49 PM, Kyle Banerjee  wrote:
> No, but parsing holding statements for something that just gets cut off
> early or which starts late should be easy unless entry is insanely
> inconsistent.

Andthere it is. :-)

We're really dealing with a few problems here:

 - Inconsistent entry by catalogers (probably the least of our worries)
 - Inconsistent publishing schedules (e.g., the Jan 1942 issue was
just plain never printed)
 - Inconsistent use of volume/number/year/month/whatever throughout a
serial's run.

So, for example, http://mirlyn.lib.umich.edu/Record/45417/Holdings#1

There are six holdings:

1919-1920 incompl
1920 incompl.
1922
v.4 no.49
v.6 1921 jul-dec
v.6 1921jan-jun

We have no way of knowing what year volume 4 was printed in, which
issues are incomplete in the two volumes that cover 1920, whether
volume number are associated with earlier (or later) issues, etc. We,
as humans, could try to make some guesses, but they'd just be guesses.

It's easy to find examples where month ranges overlap (or leave gaps),
where month names and issue numbers are sometimes used
interchangeably, where volume numbers suddenly change in the middle of
a run because of a merge with another serial (or where the first
volume isn't "1" because the serial broke off from a parent), etc.
etc. etc.

I don't mean to overstate the problem. For many (most?) serials whose
existence only goes back a few decades, a relatively simple approach
will likely work much of the time -- although even that relatively
simple approach will have to take into account a solid dozen or so
different ways that enumcron data may have been entered.

But to be able to say, with some confidence, that we have the full
run? Or a particular issue as labeled my a month name? Much, much
harder in the general case.


  -Bill-


-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] Get together in DC during ALA?

2010-06-16 Thread Bryan Baldus
On Wednesday, June 16, 2010 11:55 AM, Joe Hourcle wrote:
>We had pretty good turn out the last time we had a code4lib dinner during an 
>ALA meeting in DC a few years back.
>Are there enough code4lib people either going to ALA or local to make it worth 
>trying to organize again?

If the timing works out, I'd be interested in participating (perhaps Monday 
evening, which I believe is when it took place last time [1])

Talk to you later,

Bryan Baldus
Cataloger
Quality Books Inc.
The Best of America's Independent Presses
1-800-323-4241x402
bryan.bal...@quality-books.com
eij...@cpan.org
http://home.comcast.net/~eijabb/

[1] Message from last time:

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Monday, June 18, 2007 11:33 AM
To: CODE4LIB@listserv.nd.edu
Subject: [CODE4LIB] Informal get together Monday of ALA

Some of us have spontaneously decided to have an informal Code4Lib get together 
the Monday of ALA in DC.

We will meet on Monday the 25th of June at 8pm, at "RFD", which reccomended by 
anarchivist, which appears to be a pub and "Washington's Largest Multi-Tap".  
It's located just a couple blocks from the convention center.

http://www.lovethebeer.com/rfd.html

Some of the Talis crew have said they will be there. I will be there.
Anarchivist and edsu have said they'll be there. (I forget if I just made up 
edsu).

Please join us! Any and everyone interested in meeting code4lib folks or other 
assorted library technologists and library geeks and hangers on are welcome.

No, I wasn't planning on making a reservation or anything. No, I have no idea 
how we'll all find each other. I think it'll work out.

Jonathan


[CODE4LIB] Get together in DC during ALA?

2010-06-16 Thread Joe Hourcle
We had pretty good turn out the last time we had a code4lib dinner during 
an ALA meeting in DC a few years back.


Are there enough code4lib people either going to ALA or local to make it 
worth trying to organize again?


(I'd have to look back at my e-mails to see if we were actually organized 
last time ... if we were, it wasn't my doing).


-Joe


Re: [CODE4LIB] code4lib.hu codesprint report

2010-06-16 Thread Király Péter
- Original Message - 
From: "Mark A. Matienzo" 

On Wed, Jun 16, 2010 at 11:13 AM, Karen Coyle  wrote:


Would it be appropriate for the C4L site to link to Péter's group's page?


Of course it would.


I have done it. Thanks!

Péter
eXtensible Catalog 


Re: [CODE4LIB] code4lib.hu codesprint report

2010-06-16 Thread Mark A. Matienzo
On Wed, Jun 16, 2010 at 11:13 AM, Karen Coyle  wrote:
>
> Would it be appropriate for the C4L site to link to Péter's group's page?

Of course it would. The link for code4lib.hu on wiki.code4lib.org
currently just points to Péter's original inquiry about if there would
be interest in him creating such a group. Anyone should feel free to
replace that with an appropriate link with information about the
group.

Mark A. Matienzo
Digital Archivist, Manuscripts and Archives
Yale University Library


Re: [CODE4LIB] code4lib.hu codesprint report

2010-06-16 Thread Karen Coyle

Péter,

It's great to see this expansion of CODE4LIB. It sounds like you had a  
very successful meeting.


Would it be appropriate for the C4L site to link to Péter's group's page?

kc


Quoting Király Péter :


Hi!

I gladly report, that we had the first code4lib.hu codesprint yesterday.
The purpose was to code with each other, and learn something from
each other. It was a 3,5 hour session at the National Széchényi Library,
Budapest. We created a script, which extracts ISBN numbers and book
cover images from an OAI-PMH data provider, embeded as METS
records. Hopefuly this code will be part in two or three different library
or book related services in the next months. We have discussed the
technical details, and the advantages, and the right problems of uploading
a local history photo collection to Flickr. Unfortunatelly we didn't
have time to code the Flickr part.
There was only a couple of coders, but we had a goot talk, new acquaintances.
(For those in #code4lib: this time we had no bbq, nor 'slambuc', but lots of
biscuits and mineral water. ;-)

If - for whatever reason - you want to follow or join us, see our group page:
http://groups.google.com/group/ikr-fejlesztok/

The meeting was run as a section of the Library's K2 (library 2.0)
task force's workshop about the usage of library 2.0 tools.
http://blog.konyvtar.hu/k2/

Some technical details:
- we use PHP as the common language
- for OAI-PMH harvesting we use Omeka's OAI harvester plugin
- for Flickr communication we planned to use Phlickr, a PHP library
- the OAI server we harvested run at University of Debrecen, and based
on DSpace
- we found a bug in the Ubuntu version of PHP 5.2.10 (SimpleXMLElement have
a problem with xpath() method) - but we found a workaround as well.

Regards,
Péter




--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] new version of cql-ruby

2010-06-16 Thread Jonathan Rochkind

Jakob Voss wrote:


Is there a nice piece of code or a tutorial or example how to easily 
wrap your Solr instance to get a full SRU/SRW and/or OpenSearch 
interface? Converting CQL to Solr query format is just one part of a 
wrapper isn't it?
  

You are right, this is a building block, not a complete implementation.

For those using Blacklight, I will soon have a plugin done that provides 
CQL input to Blacklight. It is NOT a full SRU/SRW implementation 
(although perhaps the most important building block towards that, I'm 
not sure I'm interested in that myself, probably wont' be workign on it 
myself), and neither is it advertised in a Blacklight OpenSearch desc 
(which I _am_ interested in, but there are both Blacklight architectural 
issues, and "standardization" issues with that (while Tony's proto-spec 
for SRU-in-OpenSearch is a great start, I am not happy with some of it 
for my use cases).  But it will very easily provide for CQL input to a  
Blacklight app.


One building block at a time, I'm working on what I need for my local 
requirements, and trying to make what I work on re-useable building 
blocks for others.


Jonathan


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-16 Thread Rosalyn Metz
Don't forget inconsistent data from the person sending the OpenURL.

Rosalyn



On Tue, Jun 15, 2010 at 9:56 PM, Bill Dueber  wrote:
> On Tue, Jun 15, 2010 at 5:49 PM, Kyle Banerjee  wrote:
>> No, but parsing holding statements for something that just gets cut off
>> early or which starts late should be easy unless entry is insanely
>> inconsistent.
>
> Andthere it is. :-)
>
> We're really dealing with a few problems here:
>
>  - Inconsistent entry by catalogers (probably the least of our worries)
>  - Inconsistent publishing schedules (e.g., the Jan 1942 issue was
> just plain never printed)
>  - Inconsistent use of volume/number/year/month/whatever throughout a
> serial's run.
>
> So, for example, http://mirlyn.lib.umich.edu/Record/45417/Holdings#1
>
> There are six holdings:
>
> 1919-1920 incompl
> 1920 incompl.
> 1922
> v.4 no.49
> v.6 1921 jul-dec
> v.6 1921jan-jun
>
> We have no way of knowing what year volume 4 was printed in, which
> issues are incomplete in the two volumes that cover 1920, whether
> volume number are associated with earlier (or later) issues, etc. We,
> as humans, could try to make some guesses, but they'd just be guesses.
>
> It's easy to find examples where month ranges overlap (or leave gaps),
> where month names and issue numbers are sometimes used
> interchangeably, where volume numbers suddenly change in the middle of
> a run because of a merge with another serial (or where the first
> volume isn't "1" because the serial broke off from a parent), etc.
> etc. etc.
>
> I don't mean to overstate the problem. For many (most?) serials whose
> existence only goes back a few decades, a relatively simple approach
> will likely work much of the time -- although even that relatively
> simple approach will have to take into account a solid dozen or so
> different ways that enumcron data may have been entered.
>
> But to be able to say, with some confidence, that we have the full
> run? Or a particular issue as labeled my a month name? Much, much
> harder in the general case.
>
>
>  -Bill-
>
>
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library
>


Re: [CODE4LIB] new version of cql-ruby

2010-06-16 Thread LeVan,Ralph
> -Original Message-
> From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf
Of
> Jakob Voss
> 
> Is there a nice piece of code or a tutorial or example how to easily
> wrap your Solr instance to get a full SRU/SRW and/or OpenSearch
> interface? Converting CQL to Solr query format is just one part of a
> wrapper isn't it?

I've got an OpenSource SRW/U servlet (http://code.google.com/p/oclcsrw/)
that wraps other databases.  It currently has implementations for Lucene
and DSpace.  (Well, we've got a few other interfaces here at work too.)
Adding SOLR to that list should be easy, but I'm aware of at least 2
independent attempts that never went anywhere.  I'd be glad to help if
you're interested.

Ralph


[CODE4LIB] Who wants to go camping? - Invitation to Islandora Camp 2010

2010-06-16 Thread Mark Leggott
Islandora wants you. Yes, that’s right, you. We’re busy building the
Islandora ecosystem (led by Drupal and Fedora) and in order to accomplish
this, we need to assemble all interested developers, & repository managers
in one place where ideas and energy can flow. What better way to do this
than to organize a super boring conference where everyone can wear name
tags, eat bland food, and feel awkward and stuffy?

Sound good? Well, if that’s what you’re looking for, then we suggest you
look elsewhere. This is an un-conference conference, the kind where we call
it a camp, have a yurt as our logo and ask you to relax and enjoy all the
rich food and culture Prince Edward Island has to offer at the height of its
most treasured summer season.

That’s not to say we’ll be sitting around resting on our laurels, no. While
the full details on the ins and outs of the Camp are still unfolding, what
we can say for certain is that it will be 2 packed days with your colleagues
from the larger Islandora community, with one full day of sessions from
developers, & repository managers – organized by you – and one full day of
road-mapping, brainstorming plus install-fest, hackfest, and in keeping with
our theme, un-conference style discussions.

After a long day of hacking, developing, exchanging ideas, that’s when we
carry on 2 fun-filled evenings of seafood, seafood, seafood (other food) and
general good times with your own growing community.

If you’re interested in Islandora Camp (and by this point, how could you not
be?) check out our website
http://www.islandora.ca

You’ll notice this is conveniently placed immediately prior to our
3rdAnnual RIRI workshop, which is a week-long (July 26 – 30)
Repository
Institute with a focus on Fedora. PEI is fantastic, especially in the
summer, so if you’re wise, you’ll consider both. http://vre2.upei.ca/riri/

And if you like tapas, we also suggest you join us in Spain at the Open
Repositories Conference in Madrid (July 6 – 9)
http://or2010.fecyt.es/Publico/WorkShop/index.aspx.

We look forward to camping with you!

**Apologies for any cross postings. Please pass along to interested
colleagues.


Re: [CODE4LIB] new version of cql-ruby

2010-06-16 Thread Ed Summers
On Wed, Jun 16, 2010 at 8:28 AM, Jakob Voss  wrote:
> P.S: By the way we created yet another mailing list for Solr in libraries to
> discuss such things:
>
> http://groups.google.com/group/solr4lib

ugh

:-)

//Ed


Re: [CODE4LIB] new version of cql-ruby

2010-06-16 Thread Eric Lease Morgan
On Jun 16, 2010, at 8:28 AM, Jakob Voss wrote:

> Is there a nice piece of code or a tutorial or example how to easily 
> wrap your Solr instance to get a full SRU/SRW and/or OpenSearch 
> interface? Converting CQL to Solr query format is just one part of a 
> wrapper isn't it?



In a blog posting, I "illustrated" how to glue an SRU interface onto a Solr 
index. [1, 2] The illustration uses Perl. Yes, CQL is just one part of the SRU 
implementation, and again, using Perl I found the module named CQL::Parser to 
do just what I needed. [3] There is also a nice Perl module for handling the 
SRU-specific stuff. [4]

Much of this good work was done by Ed S.  edsu++


[1] posting - http://tinyurl.com/25csnb3
[2] example - http://infomotions.com/sandbox/solr-sru/
[3] CQL - http://search.cpan.org/~bricas/CQL-Parser-1.10/lib/CQL/Parser.pm
[4] SRU - http://search.cpan.org/~bricas/SRU-0.99/

-- 
Eric Lease Morgan
University of Notre Dame


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-16 Thread Tom Keays
We have been trying to enumerate serials holdings as explicitly as possible.
E.G., this microfiche supplement to a journal,
http://summit.syr.edu/cgi-bin/Pwebrecon.cgi?BBID=274291 shows apparently
missing issues. However, there are two pieces of inferred information here:

1) every print issue had a corresponding microfiche supplement (they didn't,
so most of these are complete even with the "gaps")
2) that volumes, at least up until 1991, had only 26 issues (that is
probably is true, but it is not certain) and there is no way to be certain
how many issues per volume were published with 1992 (28?, 52?)

v.95:no.3 (1973)-v.95:no.8 (1973
v.95:no.10 (1973)-v.95:no.26 (1973)
v.96 (1974)-v.97 (1975)
v.98:no.1 (1976)-v.98:no.14 (1976)
v.98:no.16 (1976)-v.98:no.26 (1976)
v.99:no.1 (1977)-v.99:no.25 (1977)
v.100 (1978)-v.108 (1986)
v.109:no.1 (1987)-v.109:no.19 (1987)
v.109:no.21 (1987)-v.109:no.26 (1987)
v.110 (1988)-v.111 (1989)
v.112:no.1 (1990)-v.112:no.26 (1990)
v.113 (1991)
v.114:no.1 (1992)-v.114:no.21 (1992)
v.114:no.23 (1992)-v.114:no.27 (1992)
v.115 (1993)-v.119 (1997)
v.120:no.2 (1998:Jan.21)-v.120:no.51 (1998:Dec.30)




On Tue, Jun 15, 2010 at 9:56 PM, Bill Dueber  wrote:

> On Tue, Jun 15, 2010 at 5:49 PM, Kyle Banerjee 
> wrote:
> > No, but parsing holding statements for something that just gets cut off
> > early or which starts late should be easy unless entry is insanely
> > inconsistent.
>
> Andthere it is. :-)
>
> We're really dealing with a few problems here:
>
>  - Inconsistent entry by catalogers (probably the least of our worries)
>  - Inconsistent publishing schedules (e.g., the Jan 1942 issue was
> just plain never printed)
>  - Inconsistent use of volume/number/year/month/whatever throughout a
> serial's run.
>
> So, for example, http://mirlyn.lib.umich.edu/Record/45417/Holdings#1
>
> There are six holdings:
>
> 1919-1920 incompl
> 1920 incompl.
> 1922
> v.4 no.49
> v.6 1921 jul-dec
> v.6 1921jan-jun
>
> We have no way of knowing what year volume 4 was printed in, which
> issues are incomplete in the two volumes that cover 1920, whether
> volume number are associated with earlier (or later) issues, etc. We,
> as humans, could try to make some guesses, but they'd just be guesses.
>
> It's easy to find examples where month ranges overlap (or leave gaps),
> where month names and issue numbers are sometimes used
> interchangeably, where volume numbers suddenly change in the middle of
> a run because of a merge with another serial (or where the first
> volume isn't "1" because the serial broke off from a parent), etc.
> etc. etc.
>
> I don't mean to overstate the problem. For many (most?) serials whose
> existence only goes back a few decades, a relatively simple approach
> will likely work much of the time -- although even that relatively
> simple approach will have to take into account a solid dozen or so
> different ways that enumcron data may have been entered.
>
> But to be able to say, with some confidence, that we have the full
> run? Or a particular issue as labeled my a month name? Much, much
> harder in the general case.
>
>
>  -Bill-
>
>
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library
>


Re: [CODE4LIB] new version of cql-ruby

2010-06-16 Thread Jakob Voss

On 15.06.2010 16:36, Jonathan Rochkind wrote:


cql-ruby is a ruby gem for parsing CQL, and serializing parse trees back
to CQL, to xCQL, or to a solr query.

A new version has been released, 0.8.0, available from gem update/install.

The new version improves greatly on the #to_solr serialization as a solr
query, providing support for translation from more CQL relations than
previously, fixing a couple bugs, and making #to_solr raise appropriate
exceptions if you try to convert CQL that is not supported for #to_solr.
See: http://cql-ruby.rubyforge.org/svn/trunk/lib/cql_ruby/cql_to_solr.rb


At the recent ELAG conference we had a workshop to get started with 
Solr. It is pretty easy to index stuff (even obscure things like MARC 
data) but as far as I understand most times you create an interface 
above your Solr instance - this can be a user interface or a wrapper to 
another API such as SRU/SRW or OpenSearch.


Is there a nice piece of code or a tutorial or example how to easily 
wrap your Solr instance to get a full SRU/SRW and/or OpenSearch 
interface? Converting CQL to Solr query format is just one part of a 
wrapper isn't it?


Jakob

P.S: By the way we created yet another mailing list for Solr in 
libraries to discuss such things:


http://groups.google.com/group/solr4lib

--
Jakob Voß , skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de