Re: [CODE4LIB] web services and widgets: MAJAX 2, ticTOC lookup, Link/360 JSON, and Google Book Classes

2009-05-19 Thread Boheemen, Peter van
Clever idea to put the TicToc stuff 'in the cloud'. How are you going to
keep it up-to-date ?

Peter 

-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Godmar Back
Sent: dinsdag 19 mei 2009 6:03
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] web services and widgets: MAJAX 2, ticTOC lookup,
Link/360 JSON, and Google Book Classes

Hi,

I would like to share a few pointers to web services and widgets Annette
and I recently collaborated on. All are available under an open source
license.

Widgets are CSS-styled HTML elements (span or div) that provide
dynamic behavior related to the underlying web service. These are
suitable for non-JavaScript programmers familiar with HTML/CSS.

1. MAJAX 2: Includes a JSON web service (e.g.,
http://libx.lib.vt.edu/services/majax2/isbn/1412936373 or
http://libx.lib.vt.edu/services/majax2/isbn/006073132x?opacbase=http%3A%
2F%2Flibcat.lafayette.edu%2Fsearchjsoncallback=majax.processResults
) and a set of widgets to include results into web pages, see
http://libx.lib.vt.edu/services/majax2/  Supports the same set of
features as MAJAX 1 (libx.org/majax) Source is at
http://code.google.com/p/majax2/

2. ticTOC lookup: is a Google App Engine app that provides a REST
interface to JISC's ticTOC data set that maps ISSN to URLs of table of
contents RSS feeds. See http://tictoclookup.appspot.com/
Example: http://tictoclookup.appspot.com/0028-0836 and optional
refinement by title:
http://tictoclookup.appspot.com/0028-0836?title=Nature
A widget library is available; see
http://laurel.lib.vt.edu/record=b1251610~S7 for a demo (shows floating
tooltips with table of contents preview via Google Feeds and places a
link to RSS feeds)  The source is at
http://code.google.com/p/tictoclookup/ and includes a stand-alone
version of the web service which doesn't use GAE. The widget library
includes support for integration into III's record display.

3. Google Book Classes at http://libx.lib.vt.edu/services/googlebooks/
- these are widgets for Google's Book Search Dynamic Links API.
Noteworthy is support for integration into III's OPAC on the search
results page (briefcit.html), on the so-called bib display page
(bib_display.html) and their WebBridge product via field selectors,
all without JavaScript. Source is at
http://code.google.com/p/googlebooks/

4. A Link/360 JSON Proxy.  See
http://libx.lib.vt.edu/services/link360/index.html
This one takes Serials Solution's Link/360 XML Service and proxies it as
JSON. Currently does not include a widget set. Caches results 24 hours
to match db update frequency.  Source is at
http://code.google.com/p/link360/  Could be combined with a widget
library, or programmed to directly, to weave Link/360 holdings data into
pages.

All JSON services accept 'jsoncallback=' for cross-domain client-side
integration.  The libx.lib.vt.edu URLs are ok to use for testing, but
for production use we recommend your own server. All modules are written
in Python as WSGI scripts, requiring setup as simple as mod_wsgi +
.htaccess.

 - Godmar


Re: [CODE4LIB] web services and widgets: MAJAX 2, ticTOC lookup, Link/360 JSON, and Google Book Classes

2009-05-19 Thread Godmar Back
On Tue, May 19, 2009 at 8:26 AM, Boheemen, Peter van
peter.vanbohee...@wur.nl wrote:
 Clever idea to put the TicToc stuff 'in the cloud'. How are you going to
 keep it up-to-date ?

By periodically reuploading the entire set (which takes about 15-20
mins), new or changed records can be updated. A changed record is one
with a new RSS feed for the same ISSN + Title combination; the data is
keyed by ISSN+Title. This process can be optimized by only uploading
the delta (you upload .csv files, so the delta can be obtained easily
via comm(1)).

Removing records is a bit of a hassle since GAE does not provide an
easy-to-use interface for that. It's possible to wipe an entire table
clean by repeatedly deleting 500 records at a time (the entire set is
about 19,000 records), then doing a fresh import. This can be done by
uploading a console application into the cloud.
(http://con.appspot.com/console/help/about ) Alternatively, smaller
sets of records can be deleted via a remove handler, which I haven't
implemented yet.  A script will need to post the data to be removed
against the handler. Will do that though if anybody uses it. User
impact is low if old records aren't removed.

A possible alternative is to have the GAE app periodically verify the
validity of each requested record with a server we'd have to run.
(Pulling the data straight from tictocs.ac.uk doesn't work since it's
larger what you're allowed to fetch.) This approach would somewhat
defeat the idea of the cloud since we'd have to rely on keeping that
server operational, albeit at a lower degree of availability and load.

Another potential issue is the quota Google provides: you get 10GBytes
and 1.3M requests free per 24 hour period, then they start charging
you ($.12 per GByte)

I think I mentioned in my post that I included a non-GAE version of
the server that only requires mod_wsgi. For that standalone version,
keeping the data set up to date is implemented by checking the last
mod time of its localy copy - it will reread its data when it detects
a more recent jrss.txt in its current directory, so keeping its data
up to date is a simple a periodically curling
http://www.tictocs.ac.uk/text.php

 - Godmar


[CODE4LIB] A Book Grab by Google

2009-05-19 Thread st...@archive.org

fyi - [the Google Book Settlement] should not be approved


A Book Grab by Google
by Brewster Kahle
Tuesday, May 19, 2009
Washington Post | Opinions
http://www.washingtonpost.com/wp-dyn/content/article/2009/05/18/AR2009051802637.html


/st...@archive.org


Re: [CODE4LIB] A Book Grab by Google

2009-05-19 Thread Ethan Gruber
Google isn't a dumb company.  They knew this would be the result all along.
The real losers here are the libraries, especially the ones that funded the
packaging and transport of their materials to the Google scanning centers
(because Google didn't pay for that, fyi)  But hey, it looks good to be part
of such a prestigious group of libraries in partnering with Google to
deliver content freely* to the public!

*not free

Pardon my cynicism,
Ethan Gruber


On Tue, May 19, 2009 at 12:50 PM, st...@archive.org st...@archive.orgwrote:

 fyi - [the Google Book Settlement] should not be approved


 A Book Grab by Google
 by Brewster Kahle
 Tuesday, May 19, 2009
 Washington Post | Opinions

 http://www.washingtonpost.com/wp-dyn/content/article/2009/05/18/AR2009051802637.html


 /st...@archive.org



Re: [CODE4LIB] A Book Grab by Google

2009-05-19 Thread Roy Tennant
On 5/19/09 5/19/09 € 9:59 AM, Ethan Gruber ewg4x...@gmail.com wrote:

 Google isn't a dumb company.  They knew this would be the result all along.
 The real losers here are the libraries, especially the ones that funded the
 packaging and transport of their materials to the Google scanning centers
 (because Google didn't pay for that, fyi)  But hey, it looks good to be part
 of such a prestigious group of libraries in partnering with Google to
 deliver content freely* to the public!
 
 *not free
 
 Pardon my cynicism,
 Ethan Gruber

I think that's an overly pessimistic assessment. There is a growing corpus
of freely available content being managed by the Hathi Trust[1], that
already numbers in the hundreds of thousands of volumes, and soon likely to
be over a million. Also, since government documents are included, there is a
surprising number of post-1923 public domain titles. So let's not rush to
throw the baby out with the bathwater.
Roy

[1] http://www.hathitrust.org/


Re: [CODE4LIB] A Book Grab by Google

2009-05-19 Thread Karen Coyle

Roy Tennant wrote:



I think that's an overly pessimistic assessment. There is a growing corpus
of freely available content being managed by the Hathi Trust[1], that
already numbers in the hundreds of thousands of volumes, and soon likely to
be over a million. Also, since government documents are included, there is a
surprising number of post-1923 public domain titles. So let's not rush to
throw the baby out with the bathwater.
Roy

  
Roy, not sure what one has to do with the other. The Google settlement 
only relates to in-copyright, out-of-print works, so Google and Hathi 
Trust appear to be a null set.


kc

--
---
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234



Re: [CODE4LIB] A Book Grab by Google

2009-05-19 Thread Jonathan Rochkind

It's true that we have buns in the oven that are promissing.

But it's also worth noting that HathiTrust mainly came about via the 
Google partnership, and they have certain limitations on what they can 
do with their scans that came out of the Google partnership (the current 
vast majority), as a result of agreements with Google. For instance, 
they can _not_ bulk re-distribute the digitized scans, not even of 
public domain works. They also need to have security measures in place 
on their website to prevent other people from conveniently bulk 
downloading via scraping (notice how you can't download a complete PDF 
of even a public domain work from HathiTrust? )


But don't get me wrong, HathiTrust is a very important project, and 
under good stewardship.  It's not like all is lost or something, we're 
just at the _beginning_ of the history of the digitized book universe, 
not at the end. 

But I agree with Ethan that I hope that libraries in the future actually 
consider their own interests when making deals with for-profit third 
parties. Like, literally, take a moment to consider what your interests 
ARE, the first step before making sure they are protected by the 
contracts you sign. I get the feeling many libraries didn't even do that.


Jonathan

Roy Tennant wrote:

On 5/19/09 5/19/09 € 9:59 AM, Ethan Gruber ewg4x...@gmail.com wrote:

  

Google isn't a dumb company.  They knew this would be the result all along.
The real losers here are the libraries, especially the ones that funded the
packaging and transport of their materials to the Google scanning centers
(because Google didn't pay for that, fyi)  But hey, it looks good to be part
of such a prestigious group of libraries in partnering with Google to
deliver content freely* to the public!

*not free

Pardon my cynicism,
Ethan Gruber



I think that's an overly pessimistic assessment. There is a growing corpus
of freely available content being managed by the Hathi Trust[1], that
already numbers in the hundreds of thousands of volumes, and soon likely to
be over a million. Also, since government documents are included, there is a
surprising number of post-1923 public domain titles. So let's not rush to
throw the baby out with the bathwater.
Roy

[1] http://www.hathitrust.org/

  


Re: [CODE4LIB] A Book Grab by Google

2009-05-19 Thread Eric Lease Morgan

On May 19, 2009, at 1:08 PM, Roy Tennant wrote:


But hey, it looks good to be part of such a prestigious
group of libraries in partnering with Google to
deliver content freely* to the public!


...I think that's an overly pessimistic assessment. There
is a growing corpus of freely available content being
managed by the Hathi Trust[1],...




[1] http://www.hathitrust.org/



Yea, but but.

Yes, I do think Google is making a book grab, especially when it comes  
to orphan works, but on the other hand, the HathiTrust is not bed of  
roses either because even for public domain works it is not possible  
to download the whole book without going through semi-heroic efforts.


I applaud the Internet Archive and the Open Content Alliance's  
efforts.  archive.org++


BTW, we are sponsoring a mini-symposium on the topic of mass  
digitization here at Notre Dame, tomorrow:


  http://www.library.nd.edu/symposium/

--
Eric Lease Morgan
Hesburgh Libraries, University of Notre Dame

(574) 631-8604


Re: [CODE4LIB] A Book Grab by Google

2009-05-19 Thread Joe Atzberger
 BTW, we are sponsoring a mini-symposium on the topic of mass digitization
 here at Notre Dame, tomorrow:

  http://www.library.nd.edu/symposium/


Nice timing.

--joe


Re: [CODE4LIB] A Book Grab by Google

2009-05-19 Thread Tim Spalding
 BTW, we are sponsoring a mini-symposium on the topic of mass digitization
 here at Notre Dame, tomorrow:

Any protesters expected? ;)

T


Re: [CODE4LIB] A Book Grab by Google [hack]

2009-05-19 Thread Eric Lease Morgan

On May 19, 2009, at 1:24 PM, Eric Lease Morgan wrote:


I applaud the Internet Archive and the Open Content Alliance's
efforts.  archive.org++




Try this hack with Google Books, not.

$ echo http://ia300206.us.archive.org/3/items/librariesreaders00fostuoft/ 
  libraries.urls


$ echo http://ia310827.us.archive.org/0/items/developmentofchi00tancuoft/ 
  libraries.urls


$ echo http://ia310832.us.archive.org/2/items/rulesregulations00brituoft/ 
  libraries.urls


$ echo 'wget -erobots=off --wait 1 -np -m -nd -A  
_djvu.txt,.pdf,.gif,_marc.xml -R _bw.pdf -i $1'  mirror.sh


$ chmod +x mirror.sh

$ ./mirror.sh libraries.urls


--
ELM


Re: [CODE4LIB] A Book Grab by Google [hack]

2009-05-19 Thread raj kumar

On May 19, 2009, at 10:40 AM, Eric Lease Morgan wrote:


On May 19, 2009, at 1:24 PM, Eric Lease Morgan wrote:


I applaud the Internet Archive and the Open Content Alliance's
efforts.  archive.org++


Try this hack with Google Books, not.

$ echo http://ia300206.us.archive.org/3/items/librariesreaders00fostuoft/ 
  libraries.urls


$ echo http://ia310827.us.archive.org/0/items/developmentofchi00tancuoft/ 
  libraries.urls


$ echo http://ia310832.us.archive.org/2/items/rulesregulations00brituoft/ 
  libraries.urls


$ echo 'wget -erobots=off --wait 1 -np -m -nd -A  
_djvu.txt,.pdf,.gif,_marc.xml -R _bw.pdf -i $1'  mirror.sh


$ chmod +x mirror.sh

$ ./mirror.sh libraries.urls


Here is a script that will let you download all the books from  
archive.org:


http://blog.openlibrary.org/2008/11/24/bulk-access-to-ocr-for-1-million-books/

You'll have to slightly modify it to download the format you want...

-raj


Re: [CODE4LIB] A Book Grab by Google [hack]

2009-05-19 Thread st...@archive.org

also, if your script can handle a redirect, you can use
our locator to find each item, e.g.

http://www.archive.org/download/librariesreaders00fostuoft/
http://www.archive.org/download/developmentofchi00tancuoft/
http://www.archive.org/download/rulesregulations00brituoft/

as the data does migrate occasionally for maintenance.


/st...@archive.org



On 5/19/09 10:51 AM, raj kumar wrote:

On May 19, 2009, at 10:40 AM, Eric Lease Morgan wrote:


On May 19, 2009, at 1:24 PM, Eric Lease Morgan wrote:


I applaud the Internet Archive and the Open Content Alliance's
efforts.  archive.org++


Try this hack with Google Books, not.

$ echo 
http://ia300206.us.archive.org/3/items/librariesreaders00fostuoft/  
libraries.urls


$ echo 
http://ia310827.us.archive.org/0/items/developmentofchi00tancuoft/  
libraries.urls


$ echo 
http://ia310832.us.archive.org/2/items/rulesregulations00brituoft/  
libraries.urls


$ echo 'wget -erobots=off --wait 1 -np -m -nd -A 
_djvu.txt,.pdf,.gif,_marc.xml -R _bw.pdf -i $1'  mirror.sh


$ chmod +x mirror.sh

$ ./mirror.sh libraries.urls


Here is a script that will let you download all the books from archive.org:

http://blog.openlibrary.org/2008/11/24/bulk-access-to-ocr-for-1-million-books/ 



You'll have to slightly modify it to download the format you want...

-raj


Re: [CODE4LIB] vuFIND -- long term direction

2009-05-19 Thread Lovins, Daniel
No intention to spoil our users and our fun with vuFIND. Just taking stock 
after 1-2 years of working with vuFIND, here are few general questions,

Hi Yaaqov.

Here's my take (and I'm cc'ing several of my colleagues in case they want to 
add or dispute anything):

1. is vuFIND primarily an experiment? if not why haven't more sites switched to 
production like VU or NLA?

We consider it beta but still more than an experiment. We're planning 
enhancements this summer: regenerating our solr index with SolrMarc 2.0 (get 
spell checker, linked 880s, etc.), implement nightly syncs, implement patron 
empowerment features (CAS + MyResearch module) and add server redundancy; 
longer term we're planning to enhance the non-Roman script functionality, 
search engine optimization, and add non-MARC metadata (journal articles? Visual 
images? Government documents? We're still debating) .  

2. is SOLR indexing satisfactory?

I would say 'yes', though there's certainly a lot of customization we'd like to 
do.

3. how many staff are needed for vuFIND's viable maintenance? for 
developing/adding features?

We've had a project manager and programmer analyst each at 50% over the past 
year, and smaller amounts of time allocated from usability, web design, 
systems, and other kinds of librarians. I would say we could have used much 
more (and we'll be ramping up this summer and beyond).

4. do we want vuFIND to measure up to the ILS (Voyager, Aleph, III, etc.) OPAC, 
 or we like it as an alternative, for its discovery tools?

As others have pointed out, VuFind is already superior to traditional OPACs in 
certain ways: relevancy ranking is extremely powerful. Faceted navigation is 
great, too, but there's a lot of tweaking that needs to be done in our case. 
The traditional OPAC is currently better at integrating authority files and 
displaying complex indexes (e.g., authors sub-arranged by titles)

5. what is plan B at your institution for when the vuFIND guru leaves?

We're planning to continue running Voyager and VuFind in parallel. Developments 
are moving so fast, that it's hard to predict the landscape 5 years from now, 
but in the mean time, VuFind gives us an excellent open platform both for 
production and RD.

6. do we need someone to co-work with Andrew on the installation package? on 
keeping track of developments, road maps?

Great idea. I think some of my Yale colleagues who are real programmers are 
planning to contribute code. I hope to contribute in other ways (e.g., on 
integration of authority files, analysis/enhancement of non-Western language 
functionality)

7. which collections, other than the catalog and OAI/repositories have we 
added? which API to other collections have we installed? are now tested?

So far we've only indexed our catalog MARC data. We've done some testing of 
other record sets (e.g., visual image records).

I'm sure many of us cope with these questions and could benefit from the 
variety of our replies. Kindest thanks,

Ya'aqov Ziso, Electronic Resource Management Librarian, Rowan University 856 
256 4804