Re: [CODE4LIB] web services and widgets: MAJAX 2, ticTOC lookup, Link/360 JSON, and Google Book Classes
Clever idea to put the TicToc stuff 'in the cloud'. How are you going to keep it up-to-date ? Peter -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Godmar Back Sent: dinsdag 19 mei 2009 6:03 To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] web services and widgets: MAJAX 2, ticTOC lookup, Link/360 JSON, and Google Book Classes Hi, I would like to share a few pointers to web services and widgets Annette and I recently collaborated on. All are available under an open source license. Widgets are CSS-styled HTML elements (span or div) that provide dynamic behavior related to the underlying web service. These are suitable for non-JavaScript programmers familiar with HTML/CSS. 1. MAJAX 2: Includes a JSON web service (e.g., http://libx.lib.vt.edu/services/majax2/isbn/1412936373 or http://libx.lib.vt.edu/services/majax2/isbn/006073132x?opacbase=http%3A% 2F%2Flibcat.lafayette.edu%2Fsearchjsoncallback=majax.processResults ) and a set of widgets to include results into web pages, see http://libx.lib.vt.edu/services/majax2/ Supports the same set of features as MAJAX 1 (libx.org/majax) Source is at http://code.google.com/p/majax2/ 2. ticTOC lookup: is a Google App Engine app that provides a REST interface to JISC's ticTOC data set that maps ISSN to URLs of table of contents RSS feeds. See http://tictoclookup.appspot.com/ Example: http://tictoclookup.appspot.com/0028-0836 and optional refinement by title: http://tictoclookup.appspot.com/0028-0836?title=Nature A widget library is available; see http://laurel.lib.vt.edu/record=b1251610~S7 for a demo (shows floating tooltips with table of contents preview via Google Feeds and places a link to RSS feeds) The source is at http://code.google.com/p/tictoclookup/ and includes a stand-alone version of the web service which doesn't use GAE. The widget library includes support for integration into III's record display. 3. Google Book Classes at http://libx.lib.vt.edu/services/googlebooks/ - these are widgets for Google's Book Search Dynamic Links API. Noteworthy is support for integration into III's OPAC on the search results page (briefcit.html), on the so-called bib display page (bib_display.html) and their WebBridge product via field selectors, all without JavaScript. Source is at http://code.google.com/p/googlebooks/ 4. A Link/360 JSON Proxy. See http://libx.lib.vt.edu/services/link360/index.html This one takes Serials Solution's Link/360 XML Service and proxies it as JSON. Currently does not include a widget set. Caches results 24 hours to match db update frequency. Source is at http://code.google.com/p/link360/ Could be combined with a widget library, or programmed to directly, to weave Link/360 holdings data into pages. All JSON services accept 'jsoncallback=' for cross-domain client-side integration. The libx.lib.vt.edu URLs are ok to use for testing, but for production use we recommend your own server. All modules are written in Python as WSGI scripts, requiring setup as simple as mod_wsgi + .htaccess. - Godmar
Re: [CODE4LIB] web services and widgets: MAJAX 2, ticTOC lookup, Link/360 JSON, and Google Book Classes
On Tue, May 19, 2009 at 8:26 AM, Boheemen, Peter van peter.vanbohee...@wur.nl wrote: Clever idea to put the TicToc stuff 'in the cloud'. How are you going to keep it up-to-date ? By periodically reuploading the entire set (which takes about 15-20 mins), new or changed records can be updated. A changed record is one with a new RSS feed for the same ISSN + Title combination; the data is keyed by ISSN+Title. This process can be optimized by only uploading the delta (you upload .csv files, so the delta can be obtained easily via comm(1)). Removing records is a bit of a hassle since GAE does not provide an easy-to-use interface for that. It's possible to wipe an entire table clean by repeatedly deleting 500 records at a time (the entire set is about 19,000 records), then doing a fresh import. This can be done by uploading a console application into the cloud. (http://con.appspot.com/console/help/about ) Alternatively, smaller sets of records can be deleted via a remove handler, which I haven't implemented yet. A script will need to post the data to be removed against the handler. Will do that though if anybody uses it. User impact is low if old records aren't removed. A possible alternative is to have the GAE app periodically verify the validity of each requested record with a server we'd have to run. (Pulling the data straight from tictocs.ac.uk doesn't work since it's larger what you're allowed to fetch.) This approach would somewhat defeat the idea of the cloud since we'd have to rely on keeping that server operational, albeit at a lower degree of availability and load. Another potential issue is the quota Google provides: you get 10GBytes and 1.3M requests free per 24 hour period, then they start charging you ($.12 per GByte) I think I mentioned in my post that I included a non-GAE version of the server that only requires mod_wsgi. For that standalone version, keeping the data set up to date is implemented by checking the last mod time of its localy copy - it will reread its data when it detects a more recent jrss.txt in its current directory, so keeping its data up to date is a simple a periodically curling http://www.tictocs.ac.uk/text.php - Godmar
[CODE4LIB] A Book Grab by Google
fyi - [the Google Book Settlement] should not be approved A Book Grab by Google by Brewster Kahle Tuesday, May 19, 2009 Washington Post | Opinions http://www.washingtonpost.com/wp-dyn/content/article/2009/05/18/AR2009051802637.html /st...@archive.org
Re: [CODE4LIB] A Book Grab by Google
Google isn't a dumb company. They knew this would be the result all along. The real losers here are the libraries, especially the ones that funded the packaging and transport of their materials to the Google scanning centers (because Google didn't pay for that, fyi) But hey, it looks good to be part of such a prestigious group of libraries in partnering with Google to deliver content freely* to the public! *not free Pardon my cynicism, Ethan Gruber On Tue, May 19, 2009 at 12:50 PM, st...@archive.org st...@archive.orgwrote: fyi - [the Google Book Settlement] should not be approved A Book Grab by Google by Brewster Kahle Tuesday, May 19, 2009 Washington Post | Opinions http://www.washingtonpost.com/wp-dyn/content/article/2009/05/18/AR2009051802637.html /st...@archive.org
Re: [CODE4LIB] A Book Grab by Google
On 5/19/09 5/19/09 9:59 AM, Ethan Gruber ewg4x...@gmail.com wrote: Google isn't a dumb company. They knew this would be the result all along. The real losers here are the libraries, especially the ones that funded the packaging and transport of their materials to the Google scanning centers (because Google didn't pay for that, fyi) But hey, it looks good to be part of such a prestigious group of libraries in partnering with Google to deliver content freely* to the public! *not free Pardon my cynicism, Ethan Gruber I think that's an overly pessimistic assessment. There is a growing corpus of freely available content being managed by the Hathi Trust[1], that already numbers in the hundreds of thousands of volumes, and soon likely to be over a million. Also, since government documents are included, there is a surprising number of post-1923 public domain titles. So let's not rush to throw the baby out with the bathwater. Roy [1] http://www.hathitrust.org/
Re: [CODE4LIB] A Book Grab by Google
Roy Tennant wrote: I think that's an overly pessimistic assessment. There is a growing corpus of freely available content being managed by the Hathi Trust[1], that already numbers in the hundreds of thousands of volumes, and soon likely to be over a million. Also, since government documents are included, there is a surprising number of post-1923 public domain titles. So let's not rush to throw the baby out with the bathwater. Roy Roy, not sure what one has to do with the other. The Google settlement only relates to in-copyright, out-of-print works, so Google and Hathi Trust appear to be a null set. kc -- --- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234
Re: [CODE4LIB] A Book Grab by Google
It's true that we have buns in the oven that are promissing. But it's also worth noting that HathiTrust mainly came about via the Google partnership, and they have certain limitations on what they can do with their scans that came out of the Google partnership (the current vast majority), as a result of agreements with Google. For instance, they can _not_ bulk re-distribute the digitized scans, not even of public domain works. They also need to have security measures in place on their website to prevent other people from conveniently bulk downloading via scraping (notice how you can't download a complete PDF of even a public domain work from HathiTrust? ) But don't get me wrong, HathiTrust is a very important project, and under good stewardship. It's not like all is lost or something, we're just at the _beginning_ of the history of the digitized book universe, not at the end. But I agree with Ethan that I hope that libraries in the future actually consider their own interests when making deals with for-profit third parties. Like, literally, take a moment to consider what your interests ARE, the first step before making sure they are protected by the contracts you sign. I get the feeling many libraries didn't even do that. Jonathan Roy Tennant wrote: On 5/19/09 5/19/09 € 9:59 AM, Ethan Gruber ewg4x...@gmail.com wrote: Google isn't a dumb company. They knew this would be the result all along. The real losers here are the libraries, especially the ones that funded the packaging and transport of their materials to the Google scanning centers (because Google didn't pay for that, fyi) But hey, it looks good to be part of such a prestigious group of libraries in partnering with Google to deliver content freely* to the public! *not free Pardon my cynicism, Ethan Gruber I think that's an overly pessimistic assessment. There is a growing corpus of freely available content being managed by the Hathi Trust[1], that already numbers in the hundreds of thousands of volumes, and soon likely to be over a million. Also, since government documents are included, there is a surprising number of post-1923 public domain titles. So let's not rush to throw the baby out with the bathwater. Roy [1] http://www.hathitrust.org/
Re: [CODE4LIB] A Book Grab by Google
On May 19, 2009, at 1:08 PM, Roy Tennant wrote: But hey, it looks good to be part of such a prestigious group of libraries in partnering with Google to deliver content freely* to the public! ...I think that's an overly pessimistic assessment. There is a growing corpus of freely available content being managed by the Hathi Trust[1],... [1] http://www.hathitrust.org/ Yea, but but. Yes, I do think Google is making a book grab, especially when it comes to orphan works, but on the other hand, the HathiTrust is not bed of roses either because even for public domain works it is not possible to download the whole book without going through semi-heroic efforts. I applaud the Internet Archive and the Open Content Alliance's efforts. archive.org++ BTW, we are sponsoring a mini-symposium on the topic of mass digitization here at Notre Dame, tomorrow: http://www.library.nd.edu/symposium/ -- Eric Lease Morgan Hesburgh Libraries, University of Notre Dame (574) 631-8604
Re: [CODE4LIB] A Book Grab by Google
BTW, we are sponsoring a mini-symposium on the topic of mass digitization here at Notre Dame, tomorrow: http://www.library.nd.edu/symposium/ Nice timing. --joe
Re: [CODE4LIB] A Book Grab by Google
BTW, we are sponsoring a mini-symposium on the topic of mass digitization here at Notre Dame, tomorrow: Any protesters expected? ;) T
Re: [CODE4LIB] A Book Grab by Google [hack]
On May 19, 2009, at 1:24 PM, Eric Lease Morgan wrote: I applaud the Internet Archive and the Open Content Alliance's efforts. archive.org++ Try this hack with Google Books, not. $ echo http://ia300206.us.archive.org/3/items/librariesreaders00fostuoft/ libraries.urls $ echo http://ia310827.us.archive.org/0/items/developmentofchi00tancuoft/ libraries.urls $ echo http://ia310832.us.archive.org/2/items/rulesregulations00brituoft/ libraries.urls $ echo 'wget -erobots=off --wait 1 -np -m -nd -A _djvu.txt,.pdf,.gif,_marc.xml -R _bw.pdf -i $1' mirror.sh $ chmod +x mirror.sh $ ./mirror.sh libraries.urls -- ELM
Re: [CODE4LIB] A Book Grab by Google [hack]
On May 19, 2009, at 10:40 AM, Eric Lease Morgan wrote: On May 19, 2009, at 1:24 PM, Eric Lease Morgan wrote: I applaud the Internet Archive and the Open Content Alliance's efforts. archive.org++ Try this hack with Google Books, not. $ echo http://ia300206.us.archive.org/3/items/librariesreaders00fostuoft/ libraries.urls $ echo http://ia310827.us.archive.org/0/items/developmentofchi00tancuoft/ libraries.urls $ echo http://ia310832.us.archive.org/2/items/rulesregulations00brituoft/ libraries.urls $ echo 'wget -erobots=off --wait 1 -np -m -nd -A _djvu.txt,.pdf,.gif,_marc.xml -R _bw.pdf -i $1' mirror.sh $ chmod +x mirror.sh $ ./mirror.sh libraries.urls Here is a script that will let you download all the books from archive.org: http://blog.openlibrary.org/2008/11/24/bulk-access-to-ocr-for-1-million-books/ You'll have to slightly modify it to download the format you want... -raj
Re: [CODE4LIB] A Book Grab by Google [hack]
also, if your script can handle a redirect, you can use our locator to find each item, e.g. http://www.archive.org/download/librariesreaders00fostuoft/ http://www.archive.org/download/developmentofchi00tancuoft/ http://www.archive.org/download/rulesregulations00brituoft/ as the data does migrate occasionally for maintenance. /st...@archive.org On 5/19/09 10:51 AM, raj kumar wrote: On May 19, 2009, at 10:40 AM, Eric Lease Morgan wrote: On May 19, 2009, at 1:24 PM, Eric Lease Morgan wrote: I applaud the Internet Archive and the Open Content Alliance's efforts. archive.org++ Try this hack with Google Books, not. $ echo http://ia300206.us.archive.org/3/items/librariesreaders00fostuoft/ libraries.urls $ echo http://ia310827.us.archive.org/0/items/developmentofchi00tancuoft/ libraries.urls $ echo http://ia310832.us.archive.org/2/items/rulesregulations00brituoft/ libraries.urls $ echo 'wget -erobots=off --wait 1 -np -m -nd -A _djvu.txt,.pdf,.gif,_marc.xml -R _bw.pdf -i $1' mirror.sh $ chmod +x mirror.sh $ ./mirror.sh libraries.urls Here is a script that will let you download all the books from archive.org: http://blog.openlibrary.org/2008/11/24/bulk-access-to-ocr-for-1-million-books/ You'll have to slightly modify it to download the format you want... -raj
Re: [CODE4LIB] vuFIND -- long term direction
No intention to spoil our users and our fun with vuFIND. Just taking stock after 1-2 years of working with vuFIND, here are few general questions, Hi Yaaqov. Here's my take (and I'm cc'ing several of my colleagues in case they want to add or dispute anything): 1. is vuFIND primarily an experiment? if not why haven't more sites switched to production like VU or NLA? We consider it beta but still more than an experiment. We're planning enhancements this summer: regenerating our solr index with SolrMarc 2.0 (get spell checker, linked 880s, etc.), implement nightly syncs, implement patron empowerment features (CAS + MyResearch module) and add server redundancy; longer term we're planning to enhance the non-Roman script functionality, search engine optimization, and add non-MARC metadata (journal articles? Visual images? Government documents? We're still debating) . 2. is SOLR indexing satisfactory? I would say 'yes', though there's certainly a lot of customization we'd like to do. 3. how many staff are needed for vuFIND's viable maintenance? for developing/adding features? We've had a project manager and programmer analyst each at 50% over the past year, and smaller amounts of time allocated from usability, web design, systems, and other kinds of librarians. I would say we could have used much more (and we'll be ramping up this summer and beyond). 4. do we want vuFIND to measure up to the ILS (Voyager, Aleph, III, etc.) OPAC, or we like it as an alternative, for its discovery tools? As others have pointed out, VuFind is already superior to traditional OPACs in certain ways: relevancy ranking is extremely powerful. Faceted navigation is great, too, but there's a lot of tweaking that needs to be done in our case. The traditional OPAC is currently better at integrating authority files and displaying complex indexes (e.g., authors sub-arranged by titles) 5. what is plan B at your institution for when the vuFIND guru leaves? We're planning to continue running Voyager and VuFind in parallel. Developments are moving so fast, that it's hard to predict the landscape 5 years from now, but in the mean time, VuFind gives us an excellent open platform both for production and RD. 6. do we need someone to co-work with Andrew on the installation package? on keeping track of developments, road maps? Great idea. I think some of my Yale colleagues who are real programmers are planning to contribute code. I hope to contribute in other ways (e.g., on integration of authority files, analysis/enhancement of non-Western language functionality) 7. which collections, other than the catalog and OAI/repositories have we added? which API to other collections have we installed? are now tested? So far we've only indexed our catalog MARC data. We've done some testing of other record sets (e.g., visual image records). I'm sure many of us cope with these questions and could benefit from the variety of our replies. Kindest thanks, Ya'aqov Ziso, Electronic Resource Management Librarian, Rowan University 856 256 4804