Re: [CODE4LIB] Internet Archive collection codes?

2008-06-05 Thread Klein, Michael
Peter,

I've seen no official information or documentation from the Internet Archive
either. I've actually been quite frustrated by several issues for a while
now. For example: If you go to
http://www.archive.org/details/nonexistentidentifier you'll get a
human-readable web page stating that the item cannot be found. That page,
however, is served up with an HTTP status of 200 OK, not 404 NOT FOUND.

In addition, I've noticed that when certain requests fail due to system load
and other issues, I get back an HTML page saying something like the system
is experiencing slowness, but again with a 200 OK instead of a 503 SERVICE
UNAVAILABLE (ideally with a Retry-After header).

These things alone make it extremely difficult to automate any large-scale
metadata retrieval from the Internet Archive, and that's without any attempt
to download content.

I'm working on a post documenting some of the techniques and strategies that
have worked for us, but it's not quite ready for human consumption yet.

Michael

--
Michael B. Klein
Digital Initiatives Technology Librarian
Boston Public Library
[EMAIL PROTECTED]


 From: Binkley, Peter [EMAIL PROTECTED]
 Reply-To: Code for Libraries CODE4LIB@LISTSERV.ND.EDU
 CODE4LIB@LISTSERV.ND.EDU
 Date: Thu, 5 Jun 2008 13:08:13 -0600
 To: CODE4LIB@LISTSERV.ND.EDU
 Conversation: [CODE4LIB] Internet Archive collection codes?
 Subject: Re: [CODE4LIB] Internet Archive collection codes?

 While we're on the subject, are there any more up-to-date instructions
 for harvesting from Internet Archive than these?
 http://biodiversitylibrary.blogspot.com/2008/03/harvesting-process-from-
 internet_14.html

 And does IA provide guidelines for harvesting (traffic limits etc.)? I
 clicked around the site a bit and didn't find them, but could easily
 have missed them.

 Peter


Re: [CODE4LIB] Internet Archive collection codes?

2008-06-04 Thread Andrew Nagy
Excuse me if I am late to the game on this one - but at the Code4Lib conference 
either Brewster Kahle or Aaron Swartz spoke about an API to either the open 
library or the internet archive.  Is this available, or any plans to release 
this?  It seems like you are referring to some sort of API.

Andrew

 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 [Alexis Rossi]
 Sent: Tuesday, June 03, 2008 10:58 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Internet Archive collection codes?

 Hi,

 You can do a search for mediatype:collection to return results for all
 4200+ collections.

 We have a search interface that will return specific fields for this
 query
 in xml format, if you'd like, but I'll need to give you some
 permissions
 to access it.  Feel free to send me an email if you'd like to use that
 ([EMAIL PROTECTED]).

 Alexis




  Does anyone know where to get a list of Internet Archive collection
  codes and their human-displayable display labels?
 
  For instance:
  americana = American Libraries
  gutenberg = Project Gutenberg
  librivoxaudio = [hell if I know]
 
 
  Some of these I can 'scrape' from the quick search box popup on the
 IA
  website. But their not all in there. And maybe there's a better place
 to
  get these?
 
  Anyone know where the right place to ask this of the IA and/or IA
  developer community is?
 
  Jonathan
 


Re: [CODE4LIB] Internet Archive collection codes? [open library api]

2008-06-04 Thread Eric Lease Morgan

On Jun 4, 2008, at 4:27 PM, Andrew Nagy wrote:


Excuse me if I am late to the game on this one - but at the
Code4Lib conference either Brewster Kahle or Aaron Swartz spoke
about an API to either the open library or the internet archive.
Is this available, or any plans to release this?  It seems like you
are referring to some sort of API.



Yes, I believe the API for Open Library can be found at:

  http://demo.openlibrary.org:8080/dev/docs/api

--
Eric Lease Morgan


Re: [CODE4LIB] Internet Archive collection codes?

2008-06-04 Thread Jason Ronallo
Andrew,
I'm not sure this is the same thing that you were told about, but what
I discovered for IA after Jonathan sent out his message is here:
http://www.archive.org/advancedsearch.php#col2

It is just a redirect to a search of a Solr index so it ought to be
easy for you to see what's going on. Do note that the address of the
Solr may change, so you'll want to use the bookmark link. You'll see
that changing the xmlsearch param from bookmark to Search bypasses the
bookmark page.

Using this you can find all collection identifiers and names as Alexis
points out.

Jason

On Wed, Jun 4, 2008 at 4:27 PM, Andrew Nagy [EMAIL PROTECTED] wrote:
 Excuse me if I am late to the game on this one - but at the Code4Lib 
 conference either Brewster Kahle or Aaron Swartz spoke about an API to either 
 the open library or the internet archive.  Is this available, or any plans to 
 release this?  It seems like you are referring to some sort of API.

 Andrew

 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 [Alexis Rossi]
 Sent: Tuesday, June 03, 2008 10:58 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Internet Archive collection codes?

 Hi,

 You can do a search for mediatype:collection to return results for all
 4200+ collections.

 We have a search interface that will return specific fields for this
 query
 in xml format, if you'd like, but I'll need to give you some
 permissions
 to access it.  Feel free to send me an email if you'd like to use that
 ([EMAIL PROTECTED]).

 Alexis




  Does anyone know where to get a list of Internet Archive collection
  codes and their human-displayable display labels?
 
  For instance:
  americana = American Libraries
  gutenberg = Project Gutenberg
  librivoxaudio = [hell if I know]
 
 
  Some of these I can 'scrape' from the quick search box popup on the
 IA
  website. But their not all in there. And maybe there's a better place
 to
  get these?
 
  Anyone know where the right place to ask this of the IA and/or IA
  developer community is?
 
  Jonathan
 



Re: [CODE4LIB] Internet Archive collection codes?

2008-06-03 Thread [Alexis Rossi]
Hi,

You can do a search for mediatype:collection to return results for all
4200+ collections.

We have a search interface that will return specific fields for this query
in xml format, if you'd like, but I'll need to give you some permissions
to access it.  Feel free to send me an email if you'd like to use that
([EMAIL PROTECTED]).

Alexis




 Does anyone know where to get a list of Internet Archive collection
 codes and their human-displayable display labels?

 For instance:
 americana = American Libraries
 gutenberg = Project Gutenberg
 librivoxaudio = [hell if I know]


 Some of these I can 'scrape' from the quick search box popup on the IA
 website. But their not all in there. And maybe there's a better place to
 get these?

 Anyone know where the right place to ask this of the IA and/or IA
 developer community is?

 Jonathan