Re: [CODE4LIB] Internet Archive collection codes?

2008-06-05 Thread Klein, Michael
Peter,

I've seen no official information or documentation from the Internet Archive
either. I've actually been quite frustrated by several issues for a while
now. For example: If you go to
http://www.archive.org/details/nonexistentidentifier you'll get a
human-readable web page stating that the item cannot be found. That page,
however, is served up with an HTTP status of 200 OK, not 404 NOT FOUND.

In addition, I've noticed that when certain requests fail due to system load
and other issues, I get back an HTML page saying something like the system
is experiencing slowness, but again with a 200 OK instead of a 503 SERVICE
UNAVAILABLE (ideally with a Retry-After header).

These things alone make it extremely difficult to automate any large-scale
metadata retrieval from the Internet Archive, and that's without any attempt
to download content.

I'm working on a post documenting some of the techniques and strategies that
have worked for us, but it's not quite ready for human consumption yet.

Michael

--
Michael B. Klein
Digital Initiatives Technology Librarian
Boston Public Library
[EMAIL PROTECTED]


 From: Binkley, Peter [EMAIL PROTECTED]
 Reply-To: Code for Libraries CODE4LIB@LISTSERV.ND.EDU
 CODE4LIB@LISTSERV.ND.EDU
 Date: Thu, 5 Jun 2008 13:08:13 -0600
 To: CODE4LIB@LISTSERV.ND.EDU
 Conversation: [CODE4LIB] Internet Archive collection codes?
 Subject: Re: [CODE4LIB] Internet Archive collection codes?

 While we're on the subject, are there any more up-to-date instructions
 for harvesting from Internet Archive than these?
 http://biodiversitylibrary.blogspot.com/2008/03/harvesting-process-from-
 internet_14.html

 And does IA provide guidelines for harvesting (traffic limits etc.)? I
 clicked around the site a bit and didn't find them, but could easily
 have missed them.

 Peter


[CODE4LIB] Open Library API

2008-06-05 Thread Cloutman, David
Inspired by a thread on this list yesterday, I started playing with the
Open Library API. In order to query through the API, you must pass a
query as a JSON serialized object. That's good, and it could be great,
given that for Java and PHP (at least) there already exists the ability
to serialize a native data type into and out of JSON. The problem that
I'm noticing is that at least the querying process, the naming
conventions used by the API complicate this.

For instance, in order to do a pattern search of the key field, one
must pass the identifier of that field with a tilde (~) appended to that
field, so that a query would read like this:

{
key~: \/about\/*
}

The problem is that for the two programming languages I use, Java and
PHP the variable name key~ and $key~ is illegal, and I believe that is
the case for most programming languages. Thus, in this PHP class (an its
Java analog) would fail at compile / parse time:

class OpenLibraryQuery {
public $key~;

__construct ($keyValue) {
$this-key~ = $keyValue;
}
}

This is a problem, because ideally, I would like to be able to do
essentially this:

$query = json_encode(new OpenLibraryQuery('\/about\/*');

which, if the above class did parse, would automatically assign $query a
valid JSON string, similar to what is above. Instead, I either have to
rename my variable, or use string manipulation to make the string work.
Note that

$query = json_encode(array('key~'='\/about\/*'));

will not be parsed through the API, and results in an error message.

This leaves me with three questions:

1. Is there an easy way around this, other than string manipulation,
that I am missing? Does the solution work for most or all programming
languages?

2. Does this strike readers as a significant enough issue to raise with
the API developers?

3. Given that Open Library runs on Infogami and has other dependencies,
does this strike readers as something that can be remedied?


- David


---
David Cloutman [EMAIL PROTECTED]
Electronic Services Librarian
Marin County Free Library

Email Disclaimer: http://www.co.marin.ca.us/nav/misc/EmailDisclaimer.cfm


Re: [CODE4LIB] Open Library API

2008-06-05 Thread Karen Coyle

I see no reason not to send this along to the developers. I don't know
if key has some special significance or if something else could be
easily substituted. The API is very new and hasn't been used much, so
it's good to surface things of this nature. (Note: I consult on the bib
data aspect of the OL, but also took a stab at expanding some of the
text in the API document because the first version was terser than terse.)

Do you have someone in mind to send it to? If not, Alexis can probably
forward it to the right people.

kc

Cloutman, David wrote:

Inspired by a thread on this list yesterday, I started playing with the
Open Library API. In order to query through the API, you must pass a
query as a JSON serialized object. That's good, and it could be great,
given that for Java and PHP (at least) there already exists the ability
to serialize a native data type into and out of JSON. The problem that
I'm noticing is that at least the querying process, the naming
conventions used by the API complicate this.

For instance, in order to do a pattern search of the key field, one
must pass the identifier of that field with a tilde (~) appended to that
field, so that a query would read like this:

{
key~: \/about\/*
}

The problem is that for the two programming languages I use, Java and
PHP the variable name key~ and $key~ is illegal, and I believe that is
the case for most programming languages. Thus, in this PHP class (an its
Java analog) would fail at compile / parse time:

class OpenLibraryQuery {
public $key~;

__construct ($keyValue) {
$this-key~ = $keyValue;
}
}

This is a problem, because ideally, I would like to be able to do
essentially this:

$query = json_encode(new OpenLibraryQuery('\/about\/*');

which, if the above class did parse, would automatically assign $query a
valid JSON string, similar to what is above. Instead, I either have to
rename my variable, or use string manipulation to make the string work.
Note that

$query = json_encode(array('key~'='\/about\/*'));

will not be parsed through the API, and results in an error message.

This leaves me with three questions:

1. Is there an easy way around this, other than string manipulation,
that I am missing? Does the solution work for most or all programming
languages?

2. Does this strike readers as a significant enough issue to raise with
the API developers?

3. Given that Open Library runs on Infogami and has other dependencies,
does this strike readers as something that can be remedied?


- David


---
David Cloutman [EMAIL PROTECTED]
Electronic Services Librarian
Marin County Free Library

Email Disclaimer: http://www.co.marin.ca.us/nav/misc/EmailDisclaimer.cfm






--
---
Karen Coyle / Digital Library Consultant
[EMAIL PROTECTED] http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



[CODE4LIB] Sorry

2008-06-05 Thread Cloutman, David
Disregard my last post. I replied to the wrong email.



---
David Cloutman [EMAIL PROTECTED]
Electronic Services Librarian
Marin County Free Library

-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
Jonathan Rochkind
Sent: Thursday, June 05, 2008 3:16 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] refworks developer documentation?


Does anyone know where, if anywhere, I find documentation on the ways to
send references to RefWorks for importing?

Not having any luck on their website. I know I've seen it before
though.  I remember there were a variety of formats and methods you
could send things to RefWorks for an import. Must be documentation
somewhere?  I bet some code4libber has done this before.

Jonathan

--
Jonathan Rochkind
Digital Services Software Engineer
The Sheridan Libraries
Johns Hopkins University
410.516.8886
rochkind (at) jhu.edu

Email Disclaimer: http://www.co.marin.ca.us/nav/misc/EmailDisclaimer.cfm