Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Bill Dueber
On Tue, Jun 15, 2010 at 5:49 PM, Kyle Banerjee  wrote:
> No, but parsing holding statements for something that just gets cut off
> early or which starts late should be easy unless entry is insanely
> inconsistent.

Andthere it is. :-)

We're really dealing with a few problems here:

 - Inconsistent entry by catalogers (probably the least of our worries)
 - Inconsistent publishing schedules (e.g., the Jan 1942 issue was
just plain never printed)
 - Inconsistent use of volume/number/year/month/whatever throughout a
serial's run.

So, for example, http://mirlyn.lib.umich.edu/Record/45417/Holdings#1

There are six holdings:

1919-1920 incompl
1920 incompl.
1922
v.4 no.49
v.6 1921 jul-dec
v.6 1921jan-jun

We have no way of knowing what year volume 4 was printed in, which
issues are incomplete in the two volumes that cover 1920, whether
volume number are associated with earlier (or later) issues, etc. We,
as humans, could try to make some guesses, but they'd just be guesses.

It's easy to find examples where month ranges overlap (or leave gaps),
where month names and issue numbers are sometimes used
interchangeably, where volume numbers suddenly change in the middle of
a run because of a merge with another serial (or where the first
volume isn't "1" because the serial broke off from a parent), etc.
etc. etc.

I don't mean to overstate the problem. For many (most?) serials whose
existence only goes back a few decades, a relatively simple approach
will likely work much of the time -- although even that relatively
simple approach will have to take into account a solid dozen or so
different ways that enumcron data may have been entered.

But to be able to say, with some confidence, that we have the full
run? Or a particular issue as labeled my a month name? Much, much
harder in the general case.


  -Bill-


-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Kyle Banerjee
> Oh you really do mean complete like "complete publication run"?  Very few
> of our journal holdings are "complete" in that sense, they are definitely in
> the minority.  We start getting something after issue 1, or stop getting it
> before the last issue. Or stop and then start again.
>
> Is this really unusual?


No, but parsing holding statements for something that just gets cut off
early or which starts late should be easy unless entry is insanely
inconsistent. If staff enter info even close to standard practices, you
still should be able to read a lot of it even when there are breaks. This is
when anal retentive behavior in the tech services dept saves your bacon.

This process will be lossy, but sometimes that's all you can do. Some
situations may be such that there's no reasonable fix that would
significantly improve things. But in that case, it makes sense to move onto
other problems. Otherwise, we wind up all our time futzing with fringe use
cases and people actually get what they need elsewhere.

kyle


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Jonathan Rochkind
Oh you really do mean complete like "complete publication run"?  Very 
few of our journal holdings are "complete" in that sense, they are 
definitely in the minority.  We start getting something after issue 1, 
or stop getting it before the last issue. Or stop and then start again.


Is this really unusual?

If all you've figured out is the "complete publication run" of a 
journal, and are assuming your library holds it... wait, how is this 
something you need for any actual use case?


My use case is trying to figure out IF we have a particular 
volume/issue, and ideally,  if so, what shelf is it located on.  If I'm 
just going to deal with journals we have the complete publication 
history of, I don't have a problem anymore, because the answer will 
always be "yes", that's a very simple algorithm, "print yes", heh.  So, 
yes, if you assume only holdings of complete publication histories, the 
problem does get very easy.


Incidentally, if anyone is looking for a schema and transmission format 
for actual _structured_ holdings information, that's flexible enough for 
idiosyncratic publication histories and holdings, but still structured 
enough to actually be machine-actionable... I still can't recommend Onix 
Serial Holdings highly enough!   I don't think it gets much use, 
probably because most of our systems simply don't _have_ this structured 
information, most of our staff interfaces don't provide reasonably 
efficient interfaces for entering, etc. But if you can get the other 
pieces and just need a schema and representation format, Onix Serial 
Holdings is nice!


Jonathan

Kyle Banerjee wrote:

On Tue, Jun 15, 2010 at 10:13 AM, Jonathan Rochkind wrote:

  

I'm not sure what you mean by "complete" holdings? The library holds the
entire run of the journal from the first issue printed to the last/current?
Or just holdings that dont' include "missing" statements?




Obviously, there has to  some sort of holdings statement -- I'm presuming
that something reasonably accurate is available. If there is no summary
holdings statement, items aren't inventoried, but holdings are believed to
be incomplete, there's not much to work with.

As far as retrospectively getting data up to scratch in the case of
 hopeless situations, there are paths that make sense. For instance,
retrospectively inventorying serials may be insane. However, from circ and
ILL data, you should know which titles are actually consulted the most. Get
those ones in shape first and work backwards.

In a major academic library, it may be the case that some titles are *never*
handled, but that doesn't cause problems if no one wants them. For low use
resources, it can make more sense to just handle things manually.

Perhaps other institutions have more easily parseable holdings data (or even
  

holdings data stored in structured form in the ILS) than mine.  For mine,
even holdings that don't include "missing" are not feasibly reliably
parseable, I've tried.




Note that you can get structured holdings data from sources other than the
library catalog -- if you know what's missing.

Sounds like your situation is particularly challenging. But there are gains
worth chasing. Service issues aside, problems like these raise existential
questions.

If we do an inadequate job of providing access, patrons will just turn to
subscription databases and no one will even care about what we do or even if
we're still around. Most major academic libraries never got their entire
card collection in the online catalog. Patrons don't use that stuff anymore,
and almost no one cares (even among librarians). It would be a mistake to
think this can't happen again.

kyle

  


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Kyle Banerjee
On Tue, Jun 15, 2010 at 10:13 AM, Jonathan Rochkind wrote:

> I'm not sure what you mean by "complete" holdings? The library holds the
> entire run of the journal from the first issue printed to the last/current?
> Or just holdings that dont' include "missing" statements?
>

Obviously, there has to  some sort of holdings statement -- I'm presuming
that something reasonably accurate is available. If there is no summary
holdings statement, items aren't inventoried, but holdings are believed to
be incomplete, there's not much to work with.

As far as retrospectively getting data up to scratch in the case of
 hopeless situations, there are paths that make sense. For instance,
retrospectively inventorying serials may be insane. However, from circ and
ILL data, you should know which titles are actually consulted the most. Get
those ones in shape first and work backwards.

In a major academic library, it may be the case that some titles are *never*
handled, but that doesn't cause problems if no one wants them. For low use
resources, it can make more sense to just handle things manually.

Perhaps other institutions have more easily parseable holdings data (or even
> holdings data stored in structured form in the ILS) than mine.  For mine,
> even holdings that don't include "missing" are not feasibly reliably
> parseable, I've tried.
>

Note that you can get structured holdings data from sources other than the
library catalog -- if you know what's missing.

Sounds like your situation is particularly challenging. But there are gains
worth chasing. Service issues aside, problems like these raise existential
questions.

If we do an inadequate job of providing access, patrons will just turn to
subscription databases and no one will even care about what we do or even if
we're still around. Most major academic libraries never got their entire
card collection in the online catalog. Patrons don't use that stuff anymore,
and almost no one cares (even among librarians). It would be a mistake to
think this can't happen again.

kyle


[CODE4LIB] code4lib.hu codesprint report

2010-06-15 Thread Király Péter

Hi!

I gladly report, that we had the first code4lib.hu codesprint yesterday.
The purpose was to code with each other, and learn something from
each other. It was a 3,5 hour session at the National Széchényi Library,
Budapest. We created a script, which extracts ISBN numbers and book
cover images from an OAI-PMH data provider, embeded as METS
records. Hopefuly this code will be part in two or three different library
or book related services in the next months. We have discussed the
technical details, and the advantages, and the right problems of uploading
a local history photo collection to Flickr. Unfortunatelly we didn't
have time to code the Flickr part.
There was only a couple of coders, but we had a goot talk, new 
acquaintances.

(For those in #code4lib: this time we had no bbq, nor 'slambuc', but lots of
biscuits and mineral water. ;-)

If - for whatever reason - you want to follow or join us, see our group 
page:

http://groups.google.com/group/ikr-fejlesztok/

The meeting was run as a section of the Library's K2 (library 2.0)
task force's workshop about the usage of library 2.0 tools.
http://blog.konyvtar.hu/k2/

Some technical details:
- we use PHP as the common language
- for OAI-PMH harvesting we use Omeka's OAI harvester plugin
- for Flickr communication we planned to use Phlickr, a PHP library
- the OAI server we harvested run at University of Debrecen, and based on 
DSpace

- we found a bug in the Ubuntu version of PHP 5.2.10 (SimpleXMLElement have
a problem with xpath() method) - but we found a workaround as well.

Regards,
Péter


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Jonathan Rochkind

Tom Keays wrote:

a) the person may not have (or know they have) an affiliation to a given
institution,
  
Then how is WorldCat going to help, if they have no idea which 
institutions listed they  might be able to get the book from!



b) may be coming from outside their institution's IP range so that even the
OCLC Registry redirect trick will fail to get them to a (let alone the
"correct") link resolver,
  
I think LibX-type plugins are the only practical solution, and while 
requiring a browser plugin is unfortunate (and only works in certain 
browsers), I can't think of a better one.  You're right this is a 
problem. But with LibX, it can be as simple as "please install this 
plugin", and if the LibX-style plugin is sophisticated enough, 
everything will Just Work, they'll get a button everywhere they want it 
to connect them to their local institution.




c) there may not be any recourse to find an item if the institution does not
own it (MPOW does not provide a link to WorldCat).
  
Well, then that's something to take up with YPOW, if you think a 
WorldCat link is a valuable service to the user, but YPOW disagrees. 
That's not a technical problem, that's a policy problem. Ideally, I'd 
like to get Umlaut doing better than just a link to worldcat, I'd like 
to get it showing Worldcat holdings directly on the screen, including 
things like "closest public library", with those libraries being 
hyperlinked directly to the library's web page or even individual 
catalog record (theoretically sort of possible with WorldCat services).  
(Along with a link to worldcat for more info).


Now, you're right that this scheme still has some problems.  But to me, 
the problem with the WorldCat page is it's just not a sufficient 
interface. It suffers from some of the same problems you mention -- they 
won't get an ILL link unless they are on-campus and your institution has 
configured it properly.  It doesn't give easy access to local library 
services like placing an ILS "request" for circulation desk hold or 
physical delivery (if it gives this access at all, it's only by a chain 
of several non-obvious clicks). Etc.


You're right that if a user doesn't have an institution with a link 
resolver, then WorldCat might be the best they can do. It would be nice 
if WorldCat interface were somewhat better for this, but it's a tricky 
problem.  Most public libraries don't have link resolvers -- I think 
they ought to (and it should be Umlaut!), but most public libraries 
haven't allocated limited resources to digital services like this.  Even 
most academic libraries don't have very _good_ link resolver interfaces 
(again, Umlaut!).


It is an imperfect world we live in, indeed. 


Jonathan



Tom

On Tue, Jun 15, 2010 at 12:16 PM, Walker, David wrote:

  

It seems like the more productive path if the goal of a user is
simply to locate a copy, where ever it is held.
  

But I don't think users have *locating a copy* as their goal.  Rather, I
think their goal is to *get their hands on the book*.

If I discover a book via COINs, and you drop me off at Worldcat.org, that
allows me to see which libraries own the book.  But, unless I happen to be
affiliated with those institutions, that's kinda useless information.  I
have no real way of actually getting the book itself.

If, instead, you drop me off at your institution's link resolver menu, and
provide me an ILL option in the event you don't have the book, the library
can get the book for me, which is really my *goal*.

That seems like the more productive path, IMO.

--Dave

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Tom Keays
[tomke...@gmail.com]
Sent: Tuesday, June 15, 2010 8:43 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

On Mon, Jun 14, 2010 at 3:47 PM, Jonathan Rochkind 
wrote:



The trick here is that traditional library metadata practices make it
  

_very


hard_ to tell if a _specific volume/issue_ is held by a given library.
  

 And


those are the most common use cases for OpenURL.

  

Yep. That's true even for individual library's with link resolvers. OCLC is
not going to be able to solve that particular issue until the local
libraries do.




If you just want to get to the title level (for a journal or a book), you
can easily write your own thing that takes an OpenURL, and either just
redirects straight to worldcat.org on isbn/lccn/oclcnum, or actually
  

does


a WorldCat API lookup to ensure the record exists first and/or looks up
  

on


author/title/etc too.

  

I was mainly thinking of sources that use COinS. If you have a rarely held
book, for instance, then OpenURLs resolved against random institutional
endpoints are going to mostly be unproductive. However, a "union" catalo

Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Jonathan Rochkind
I'm not sure what you mean by "complete" holdings? The library holds the 
entire run of the journal from the first issue printed to the 
last/current? Or just holdings that dont' include "missing" statements?


Perhaps other institutions have more easily parseable holdings data (or 
even holdings data stored in structured form in the ILS) than mine.  For 
mine, even holdings that don't include "missing" are not feasibly 
reliably parseable, I've tried.


Jonathan

Kyle Banerjee wrote:

But if you think it's easy, please, give it a try and get back to us. :)
Maybe your library's data is cleaner than mine.




I don't think it's easy, but I think detecting *complete* holdings is a big
part of the picture and that can be done fairly well.

Cleanliness of data will vary from one institution to another, and quite a
bit of it will be parsible. Even if you only can't even get half, you're
still way ahead of where you'd otherwise be.


  

I think it's kind of a crime that our ILS (and many other ILSs) doesn't
provide a way for holdings to be efficiency entered (or guessed from
prediction patterns etc) AND converted to an internal structured format that
actually contains the semantic info we want.




There's too much variation in what people want to do.  Even going with
manual MFHD, it's still pretty easy to generate stuff that's pretty hard to
parse

kyle

  


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Tom Keays
I do provide the user with the proxied WorldCat URL for just the reasons
Jonathan cites. But, no, being an otherwise open web resource, you can't
force a user to use it.

On Tue, Jun 15, 2010 at 12:22 PM, Jonathan Rochkind wrote:

>
> I haven't yet found any good way to do this if the user is off-campus
> (ezproxy not a good solution, how do we 'force' the user to use ezproxy for
> worldcat.org anyway?).
>
>


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Tom Keays
I think my perspective of the user's goal is actually the same (or close
enough to the same) as David's, just stated differently. The user wants the
most local copy or, failing that, a way to order it from another source.

However, I have plenty of examples of faculty and occasional grad students
who are willing to make the trek to a nearby library -- even out of town
libraries -- rather than do ILL. This doesn't encompass every use case or
even a typical use case (are there typical cases?), but it does no harm to
have information even if you can't always act on it.

The problem with OpenURL tied to a particular institution is
a) the person may not have (or know they have) an affiliation to a given
institution,
b) may be coming from outside their institution's IP range so that even the
OCLC Registry redirect trick will fail to get them to a (let alone the
"correct") link resolver,
c) there may not be any recourse to find an item if the institution does not
own it (MPOW does not provide a link to WorldCat).

Tom

On Tue, Jun 15, 2010 at 12:16 PM, Walker, David wrote:

> > It seems like the more productive path if the goal of a user is
> > simply to locate a copy, where ever it is held.
>
> But I don't think users have *locating a copy* as their goal.  Rather, I
> think their goal is to *get their hands on the book*.
>
> If I discover a book via COINs, and you drop me off at Worldcat.org, that
> allows me to see which libraries own the book.  But, unless I happen to be
> affiliated with those institutions, that's kinda useless information.  I
> have no real way of actually getting the book itself.
>
> If, instead, you drop me off at your institution's link resolver menu, and
> provide me an ILL option in the event you don't have the book, the library
> can get the book for me, which is really my *goal*.
>
> That seems like the more productive path, IMO.
>
> --Dave
>
> ==
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu
> 
> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Tom Keays
> [tomke...@gmail.com]
> Sent: Tuesday, June 15, 2010 8:43 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?
>
> On Mon, Jun 14, 2010 at 3:47 PM, Jonathan Rochkind 
> wrote:
>
> > The trick here is that traditional library metadata practices make it
> _very
> > hard_ to tell if a _specific volume/issue_ is held by a given library.
>  And
> > those are the most common use cases for OpenURL.
> >
>
> Yep. That's true even for individual library's with link resolvers. OCLC is
> not going to be able to solve that particular issue until the local
> libraries do.
>
>
> > If you just want to get to the title level (for a journal or a book), you
> > can easily write your own thing that takes an OpenURL, and either just
> > redirects straight to worldcat.org on isbn/lccn/oclcnum, or actually
> does
> > a WorldCat API lookup to ensure the record exists first and/or looks up
> on
> > author/title/etc too.
> >
>
> I was mainly thinking of sources that use COinS. If you have a rarely held
> book, for instance, then OpenURLs resolved against random institutional
> endpoints are going to mostly be unproductive. However, a "union" catalog
> such as OCLC already has the information about libraries in the system that
> own it. It seems like the more productive path if the goal of a user is
> simply to locate a copy, where ever it is held.
>
>
> > Umlaut already includes the 'naive' "just link to worldcat.org based on
> > isbn, oclcnum, or lccn" approach, functionality that was written before
> the
> > worldcat api exists. That is, Umlaut takes an incoming OpenURL, and
> provides
> > the user with a link to a worldcat record based on isbn, oclcnum, or
> lccn.
> >
>
> Many institutions have chosen to do this. MPOW, however, represents a
> counter-example and do not link out to OCLC.
>
> Tom
>


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Kyle Banerjee
>
> But if you think it's easy, please, give it a try and get back to us. :)
> Maybe your library's data is cleaner than mine.
>
>
I don't think it's easy, but I think detecting *complete* holdings is a big
part of the picture and that can be done fairly well.

Cleanliness of data will vary from one institution to another, and quite a
bit of it will be parsible. Even if you only can't even get half, you're
still way ahead of where you'd otherwise be.


> I think it's kind of a crime that our ILS (and many other ILSs) doesn't
> provide a way for holdings to be efficiency entered (or guessed from
> prediction patterns etc) AND converted to an internal structured format that
> actually contains the semantic info we want.


There's too much variation in what people want to do.  Even going with
manual MFHD, it's still pretty easy to generate stuff that's pretty hard to
parse

kyle


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Markus Fischer

Kyle Banerjee schrieb:

This might not be as bad as people think. The normal argument is that
holdings are in free text and there's no way staff will ever have enough
time to record volume level holdings. However, significant chunks of the
problem can be addressed using relatively simple methods.

For example, if you can identify complete runs, you know that a library has
all holdings and can start automating things.


That's what we've done for journal holdings (only) in

https://sourceforge.net/projects/doctor-doc/

Works perfect in combination with an EZB-account 
(rzblx1.uni-regensburg.de/ezeit) as a linkresolver. May be as exact as 
on issue level.


The tool is beeing used by around 100 libraries in Germany, Switzerland 
and Austria.


If you check this one out: Don't expect the perfect OS-system. It has 
been developped by me (head of library and no IT-Professional) and a 
colleague (IT-Professional). I learned a lot through this one.


There is plenty room for improvement in it: some things implemented not 
yet so nice, other things done quite nice ;-)


If you want to discuss, use or contribute:

https://sourceforge.net/projects/doctor-doc/support

Very welcome!

Markus Fischer



While my comments are mostly concerned with journal holdings, similar logic
can be used with monographic series as well.

kyle


Re: [CODE4LIB] Code4Lib Northwest meeting report

2010-06-15 Thread Shirley Lincicum
A couple of presenters have already added links to their presentation slides
on the Schedule page at:
http://groups.google.com/group/pnwcode4lib/web/code4lib-northwest-2010

Perhaps we could encourage other presenters to do this as well?

Shirley

On Tue, Jun 15, 2010 at 9:32 AM, Kyle Banerjee wrote:

> Event was not recorded, but I'm sure a shoutout for slides will generate
> some nice stuff to link to. Note to self: in future, it might not be a bad
> idea to copy slides to a flash drive after each presentation
>
> kyle
>
> On Mon, Jun 14, 2010 at 7:29 PM, Ed Summers  wrote:
>
> > Wow, this looks like it was a great event. I don't suppose any of the
> > talks were recorded, or that any slides are available? I'm
> > particularly interested in Karen Estlund's talk about NoCode: Digital
> > Preservation of Electronic Records...and well, all of the talks :-)
> >
> > //Ed
> >
> > On Mon, Jun 14, 2010 at 2:22 PM, Kyle Banerjee 
> > wrote:
> > > Code4Lib Northwest was held June 7 at the White Stag building in
> > Portland,
> > > OR.
> > >
> > > Registration was closed and a waiting list established at least a month
> > > before the event because the room capacity of 65 was reached. Ten 20
> > minute
> > > sessions and 13 lightning talks listed at
> > >
> http://groups.google.com/group/pnwcode4lib/web/code4lib-northwest-2010made
> > > for a full day. Our official timekeeper (a screaming flying monkey who
> > gets
> > > upset if people yak too long) was the only one who was bored as
> everyone
> > > stayed focused to the end.
> > >
> > > About half of the attendees filled out evaluation forms. Roughly 70%
> > rated
> > > it as excellent, and everyone else gave it the next highest rating. A
> > number
> > > of themes appeared in the responses. People loved the format consisting
> > of
> > > short presentations and lightning talks. They also gave the content,
> > food,
> > > and venue high marks.
> > >
> > > However, both at the post conference evaluation and in written
> comments,
> > > people said they'd like more opportunity to interact directly with
> > others.
> > > Breakout sessions, short Q/A sessions after every couple speakers, and
> > other
> > > ideas were suggested. These ideas and others will be considered for
> > > improving next year's event.
> > >
> > > Significantly, a number of excellent technologists with a feel for the
> > > code4lib spirit indicated willingness to help organize the next
> Code4Lib
> > > Northwest, so we're already looking forward to Code4Lib Northwest 2011.
> > >
> > > Respectfully submitted,
> > >
> > > kyle
> > >
> >
>
>
>
> --
> --
> Kyle Banerjee
> Digital Services Program Manager
> Orbis Cascade Alliance
> baner...@uoregon.edu / 503.999.9787
>


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Jonathan Rochkind
When I've tried to do this, it's been much harder than your story, I'm 
afraid.


My library data is very inconsistent in the way it expresses it's 
holdings. Even _without_ "missing" items, the holdings are expressed in 
human-readable narrative form which is very difficult to parse reliably.


Theoretically, the holdings are expressed according to, I forget the 
name of the Z. standard, but some standard for expressing human readable 
holdings with certain punctuation and such. Even if they really WERE all 
exactly according to this standard, this standard is not very easy to 
parse consistently and reliably. But in fact, since when these tags are 
entered nothing validates them to this standard -- and at different 
times in history the cataloging staff entering them in various libraries 
had various ideas about how strictly they should follow this local 
"policy" -- our holdings are not even reliably according to that standard.


But if you think it's easy, please, give it a try and get back to us. :) 
Maybe your library's data is cleaner than mine.


I think it's kind of a crime that our ILS (and many other ILSs) doesn't 
provide a way for holdings to be efficiency entered (or guessed from 
prediction patterns etc) AND converted to an internal structured format 
that actually contains the semantic info we want. Offering catalogers 
the option to manually enter an MFHD is not a solution.


Jonathan

Kyle Banerjee wrote:

The trick here is that traditional library metadata practices make it
  

_very


hard_ to tell if a _specific volume/issue_ is held by a given library.
  

 And


those are the most common use cases for OpenURL.

  

Yep. That's true even for individual library's with link resolvers. OCLC is
not going to be able to solve that particular issue until the local
libraries do.




This might not be as bad as people think. The normal argument is that
holdings are in free text and there's no way staff will ever have enough
time to record volume level holdings. However, significant chunks of the
problem can be addressed using relatively simple methods.

For example, if you can identify complete runs, you know that a library has
all holdings and can start automating things.

With this in mind, the first step is to identify incomplete holdings. The
mere presence of lingo like "missing," "lost," "incomplete," "scattered,"
"wanting," etc. is a dead giveaway.  So are bracketed fields that contain
enumeration or temporal data (though you'll get false hits using this method
when catalogers supply enumeration). Commas in any field that contains
enumeration or temporal data also indicate incomplete holdings.

I suspect that the mere presence of a note is a great indicator that
holdings are incomplete since what kind of yutz writes a note saying "all
the holdings are here just like you'd expect?" Having said that, I need to
crawl through a lot more data before being comfortable with that statement.

Regexp matches can be used to search for closed date ranges in open serials
or close dates within 866 that don't correspond to close dates within fixed
fields.

That's the first pass. The second pass would be to search for the most
common patterns that occur within incomplete holdings. Wash, rinse, repeat.
After awhile, you'll get to all the cornball schemes that don't lend
themselves towards automation, but hopefully that group of materials is
getting to a more manageable size where throwing labor at the metadata makes
some sense. Possibly guessing if a volume is available based on timeframe is
a good way to go.

Worst case scenario if the program can't handle it is you deflect the
request to the next institution, and that already happens all the time for a
variety of reasons.

While my comments are mostly concerned with journal holdings, similar logic
can be used with monographic series as well.

kyle

  


Re: [CODE4LIB] Code4Lib Northwest meeting report

2010-06-15 Thread Kyle Banerjee
Event was not recorded, but I'm sure a shoutout for slides will generate
some nice stuff to link to. Note to self: in future, it might not be a bad
idea to copy slides to a flash drive after each presentation

kyle

On Mon, Jun 14, 2010 at 7:29 PM, Ed Summers  wrote:

> Wow, this looks like it was a great event. I don't suppose any of the
> talks were recorded, or that any slides are available? I'm
> particularly interested in Karen Estlund's talk about NoCode: Digital
> Preservation of Electronic Records...and well, all of the talks :-)
>
> //Ed
>
> On Mon, Jun 14, 2010 at 2:22 PM, Kyle Banerjee 
> wrote:
> > Code4Lib Northwest was held June 7 at the White Stag building in
> Portland,
> > OR.
> >
> > Registration was closed and a waiting list established at least a month
> > before the event because the room capacity of 65 was reached. Ten 20
> minute
> > sessions and 13 lightning talks listed at
> > http://groups.google.com/group/pnwcode4lib/web/code4lib-northwest-2010made
> > for a full day. Our official timekeeper (a screaming flying monkey who
> gets
> > upset if people yak too long) was the only one who was bored as everyone
> > stayed focused to the end.
> >
> > About half of the attendees filled out evaluation forms. Roughly 70%
> rated
> > it as excellent, and everyone else gave it the next highest rating. A
> number
> > of themes appeared in the responses. People loved the format consisting
> of
> > short presentations and lightning talks. They also gave the content,
> food,
> > and venue high marks.
> >
> > However, both at the post conference evaluation and in written comments,
> > people said they'd like more opportunity to interact directly with
> others.
> > Breakout sessions, short Q/A sessions after every couple speakers, and
> other
> > ideas were suggested. These ideas and others will be considered for
> > improving next year's event.
> >
> > Significantly, a number of excellent technologists with a feel for the
> > code4lib spirit indicated willingness to help organize the next Code4Lib
> > Northwest, so we're already looking forward to Code4Lib Northwest 2011.
> >
> > Respectfully submitted,
> >
> > kyle
> >
>



-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.999.9787


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Kyle Banerjee
>
> > The trick here is that traditional library metadata practices make it
> _very
> > hard_ to tell if a _specific volume/issue_ is held by a given library.
>  And
> > those are the most common use cases for OpenURL.
> >
>
> Yep. That's true even for individual library's with link resolvers. OCLC is
> not going to be able to solve that particular issue until the local
> libraries do.
>

This might not be as bad as people think. The normal argument is that
holdings are in free text and there's no way staff will ever have enough
time to record volume level holdings. However, significant chunks of the
problem can be addressed using relatively simple methods.

For example, if you can identify complete runs, you know that a library has
all holdings and can start automating things.

With this in mind, the first step is to identify incomplete holdings. The
mere presence of lingo like "missing," "lost," "incomplete," "scattered,"
"wanting," etc. is a dead giveaway.  So are bracketed fields that contain
enumeration or temporal data (though you'll get false hits using this method
when catalogers supply enumeration). Commas in any field that contains
enumeration or temporal data also indicate incomplete holdings.

I suspect that the mere presence of a note is a great indicator that
holdings are incomplete since what kind of yutz writes a note saying "all
the holdings are here just like you'd expect?" Having said that, I need to
crawl through a lot more data before being comfortable with that statement.

Regexp matches can be used to search for closed date ranges in open serials
or close dates within 866 that don't correspond to close dates within fixed
fields.

That's the first pass. The second pass would be to search for the most
common patterns that occur within incomplete holdings. Wash, rinse, repeat.
After awhile, you'll get to all the cornball schemes that don't lend
themselves towards automation, but hopefully that group of materials is
getting to a more manageable size where throwing labor at the metadata makes
some sense. Possibly guessing if a volume is available based on timeframe is
a good way to go.

Worst case scenario if the program can't handle it is you deflect the
request to the next institution, and that already happens all the time for a
variety of reasons.

While my comments are mostly concerned with journal holdings, similar logic
can be used with monographic series as well.

kyle


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Jonathan Rochkind
IF the user is coming from a recognized on-campus IP, you can configure 
WorldCat to give the user an ILL link to your library too. At least if 
you use ILLiad, maybe if you use something else (esp if your ILL 
software can accept OpenURLs too!).


I haven't yet found any good way to do this if the user is off-campus 
(ezproxy not a good solution, how do we 'force' the user to use ezproxy 
for worldcat.org anyway?).


But in any event, I agree with Dave that worldcat.org isn't a great 
interface even if you DO get it to have an ILL link in an odd place. I 
think we can do better. Which is really the whole purpose of Umlaut as 
an institutional link resolver, giving the user a better screen for "I 
found this citation somewhere else, library what can you do to get it in 
my hands asap?"


Still wondering why Umlaut hasn't gotten more interest from people, heh. 
But we're using it here at JHU, and NYU and the New School are also 
using it.


Jonathan

Walker, David wrote:

It seems like the more productive path if the goal of a user is
simply to locate a copy, where ever it is held.



But I don't think users have *locating a copy* as their goal.  Rather, I think 
their goal is to *get their hands on the book*.

If I discover a book via COINs, and you drop me off at Worldcat.org, that 
allows me to see which libraries own the book.  But, unless I happen to be 
affiliated with those institutions, that's kinda useless information.  I have 
no real way of actually getting the book itself.

If, instead, you drop me off at your institution's link resolver menu, and 
provide me an ILL option in the event you don't have the book, the library can 
get the book for me, which is really my *goal*.

That seems like the more productive path, IMO.

--Dave

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Tom Keays 
[tomke...@gmail.com]
Sent: Tuesday, June 15, 2010 8:43 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

On Mon, Jun 14, 2010 at 3:47 PM, Jonathan Rochkind  wrote:

  

The trick here is that traditional library metadata practices make it _very
hard_ to tell if a _specific volume/issue_ is held by a given library.  And
those are the most common use cases for OpenURL.




Yep. That's true even for individual library's with link resolvers. OCLC is
not going to be able to solve that particular issue until the local
libraries do.


  

If you just want to get to the title level (for a journal or a book), you
can easily write your own thing that takes an OpenURL, and either just
redirects straight to worldcat.org on isbn/lccn/oclcnum, or actually does
a WorldCat API lookup to ensure the record exists first and/or looks up on
author/title/etc too.




I was mainly thinking of sources that use COinS. If you have a rarely held
book, for instance, then OpenURLs resolved against random institutional
endpoints are going to mostly be unproductive. However, a "union" catalog
such as OCLC already has the information about libraries in the system that
own it. It seems like the more productive path if the goal of a user is
simply to locate a copy, where ever it is held.


  

Umlaut already includes the 'naive' "just link to worldcat.org based on
isbn, oclcnum, or lccn" approach, functionality that was written before the
worldcat api exists. That is, Umlaut takes an incoming OpenURL, and provides
the user with a link to a worldcat record based on isbn, oclcnum, or lccn.




Many institutions have chosen to do this. MPOW, however, represents a
counter-example and do not link out to OCLC.

Tom

  


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Walker, David
> It seems like the more productive path if the goal of a user is
> simply to locate a copy, where ever it is held.

But I don't think users have *locating a copy* as their goal.  Rather, I think 
their goal is to *get their hands on the book*.

If I discover a book via COINs, and you drop me off at Worldcat.org, that 
allows me to see which libraries own the book.  But, unless I happen to be 
affiliated with those institutions, that's kinda useless information.  I have 
no real way of actually getting the book itself.

If, instead, you drop me off at your institution's link resolver menu, and 
provide me an ILL option in the event you don't have the book, the library can 
get the book for me, which is really my *goal*.

That seems like the more productive path, IMO.

--Dave

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Tom Keays 
[tomke...@gmail.com]
Sent: Tuesday, June 15, 2010 8:43 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

On Mon, Jun 14, 2010 at 3:47 PM, Jonathan Rochkind  wrote:

> The trick here is that traditional library metadata practices make it _very
> hard_ to tell if a _specific volume/issue_ is held by a given library.  And
> those are the most common use cases for OpenURL.
>

Yep. That's true even for individual library's with link resolvers. OCLC is
not going to be able to solve that particular issue until the local
libraries do.


> If you just want to get to the title level (for a journal or a book), you
> can easily write your own thing that takes an OpenURL, and either just
> redirects straight to worldcat.org on isbn/lccn/oclcnum, or actually does
> a WorldCat API lookup to ensure the record exists first and/or looks up on
> author/title/etc too.
>

I was mainly thinking of sources that use COinS. If you have a rarely held
book, for instance, then OpenURLs resolved against random institutional
endpoints are going to mostly be unproductive. However, a "union" catalog
such as OCLC already has the information about libraries in the system that
own it. It seems like the more productive path if the goal of a user is
simply to locate a copy, where ever it is held.


> Umlaut already includes the 'naive' "just link to worldcat.org based on
> isbn, oclcnum, or lccn" approach, functionality that was written before the
> worldcat api exists. That is, Umlaut takes an incoming OpenURL, and provides
> the user with a link to a worldcat record based on isbn, oclcnum, or lccn.
>

Many institutions have chosen to do this. MPOW, however, represents a
counter-example and do not link out to OCLC.

Tom


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Jonathan Rochkind

Tom Keays wrote:


I was mainly thinking of sources that use COinS. If you have a rarely held
book, for instance, then OpenURLs resolved against random institutional
endpoints are going to mostly be unproductive. However, a "union" catalog
such as OCLC already has the information about libraries in the system that
own it. It seems like the more productive path if the goal of a user is
simply to locate a copy, where ever it is held.
  
Even if OCLC doens't get to it, it would not be that hard to write your 
own "wrapper" that accepts an OpenURL, uses the WorldCat API to search 
WorldCat, and then redirects the user to the 'best match' worldcat.org 
record.  That is for title-level (book and journal) records -- for 
article-level, forget it, as discussed.


The trick will be correctly identifying the 'best match' if the openurl 
does not have an oclcnum/lccn/isbn, but only has author/title/year.  But 
my experiments with doing similar things leads me to be optimistic you 
could get 'good enough' (but definitely not perfect) behavior here with 
an intermediate amount of work.


The 'wrapper' could of course present a list of options when it's not 
sure it can identify a single best match.


Jonathan




  

Umlaut already includes the 'naive' "just link to worldcat.org based on
isbn, oclcnum, or lccn" approach, functionality that was written before the
worldcat api exists. That is, Umlaut takes an incoming OpenURL, and provides
the user with a link to a worldcat record based on isbn, oclcnum, or lccn.




Many institutions have chosen to do this. MPOW, however, represents a
counter-example and do not link out to OCLC.

Tom
  


Re: [CODE4LIB] WorldCat as an OpenURL endpoint ?

2010-06-15 Thread Tom Keays
On Mon, Jun 14, 2010 at 3:47 PM, Jonathan Rochkind  wrote:

> The trick here is that traditional library metadata practices make it _very
> hard_ to tell if a _specific volume/issue_ is held by a given library.  And
> those are the most common use cases for OpenURL.
>

Yep. That's true even for individual library's with link resolvers. OCLC is
not going to be able to solve that particular issue until the local
libraries do.


> If you just want to get to the title level (for a journal or a book), you
> can easily write your own thing that takes an OpenURL, and either just
> redirects straight to worldcat.org on isbn/lccn/oclcnum, or actually does
> a WorldCat API lookup to ensure the record exists first and/or looks up on
> author/title/etc too.
>

I was mainly thinking of sources that use COinS. If you have a rarely held
book, for instance, then OpenURLs resolved against random institutional
endpoints are going to mostly be unproductive. However, a "union" catalog
such as OCLC already has the information about libraries in the system that
own it. It seems like the more productive path if the goal of a user is
simply to locate a copy, where ever it is held.


> Umlaut already includes the 'naive' "just link to worldcat.org based on
> isbn, oclcnum, or lccn" approach, functionality that was written before the
> worldcat api exists. That is, Umlaut takes an incoming OpenURL, and provides
> the user with a link to a worldcat record based on isbn, oclcnum, or lccn.
>

Many institutions have chosen to do this. MPOW, however, represents a
counter-example and do not link out to OCLC.

Tom


[CODE4LIB] new version of cql-ruby

2010-06-15 Thread Jonathan Rochkind
cql-ruby is a ruby gem for parsing CQL, and serializing parse trees back 
to CQL, to xCQL, or to a solr query.


A new version has been released, 0.8.0, available from gem update/install.

The new version improves greatly on the #to_solr serialization as a solr 
query, providing support for translation from more CQL relations than 
previously, fixing a couple bugs, and making #to_solr raise appropriate 
exceptions if you try to convert CQL that is not supported for 
#to_solr.  See: 
http://cql-ruby.rubyforge.org/svn/trunk/lib/cql_ruby/cql_to_solr.rb


That's the only change from the previous version, improved #to_solr.

I wrote the improved #to_solr, Chick Markley wrote the original cql-ruby 
gem, which was a port of the Java CQL parsing code by Mike Taylor. Ain't 
open source grand?


Jonathan


[CODE4LIB] proposal deadline approaches

2010-06-15 Thread EdUI Conference
For those of you looking for an opportunity to showcase a project, talk
about web design, user experience or just about anything web, here's a quick
reminder that the edUi 2010  deadline for proposals is
about one month away.

What is edUi?
A learning opportunity for web professionals serving institutions of
learning.

When is edUi 2010?
November 8-9, 2010

Where is edUi 2010?
Charlottesville, VA

Thanks!

-Trey


[CODE4LIB] open source software for libraries (was: planet code4lib code)

2010-06-15 Thread Jakob Voss

Hi,

I stumbled upon this messaage from Galen Charlton from March this year:


On Sun, Mar 28, 2010 at 6:08 PM, Jonathan Rochkind
wrote:

Plus'ing it is one thing, but I have no idea what such a thing
would actually look like (interface-wise), or how it would be
accomplished. I'm not sure what it means exactly. It's an
interesting idea, but anyone have any idea what it would actually
look like?


Perhaps as a sideways start we could use use the 'code4lib' tag on
Ohloh to link projects together?


I tried to find all library specific software projects at ohloh and 
tagged them with 'code4lib'. Please add and update your favorite OSS 
software (description, tags, repositories...) so we can get code 
analysis and send us "kodus" points to each other. Ohloh seems to be 
kind of a Facebook or Farmville for Open Source developers ;-)


https://www.ohloh.net/tags/code4lib

Ohloh also has an API so this list could be embedded on other pages.

Cheers
Jakob

--
Jakob Voß , skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de


Re: [CODE4LIB] Twitter annotations and library software

2010-06-15 Thread Jakob Voss

On 07.06.2010 16:15, Jay Luker wrote:

Hi all,

I found this thread rather interesting and figured I'd try and revive
the convo since apparently some things have been happening in the
twitter annotation space in the past month. I just read on techcrunch
that testing of the annotation features will commence next week [1].
Also it appears that an initial schema for a "book" type has been
defined [2].


> [1] http://techcrunch.com/2010/06/02/twitter-annotations-testing/
> [2] http://apiwiki.twitter.com/Annotations-Overview#RecommendedTypes
>

Have any code4libbers gotten involved in this beyond just opining on list?


I don't this so - the discussion slipped to general data modelling 
questions. For the specific, limited use case of twitter annotations I 
bet the recommended format from [2] will be fine (title is implied as 
common attribute, url is optional):


{"book":{
  "title": "...",
  "author": "...",
  "isbn": "...",
  "year": "",
  "url": "..."
}}

I only miss an "article" type with a "doi" field for non-books.

Cheers,
Jako


--
Jakob Voß , skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de