Re: [CODE4LIB] hathitrust research center workset browser
Eric, what happens if you access this from a non-HT institution? When I go to HT I am often unable to download public domain titles because they aren't available to members of the general public. kc On 5/26/15 8:30 AM, Eric Lease Morgan wrote: In my copious spare time I have hacked together a thing I’m calling the HathiTrust Research Center Workset Browser, a (fledgling) tool for doing “distant reading” against corpora from the HathiTrust. [1] The idea is to: 1) create, refine, or identify a HathiTrust Research Center workset of interest — your corpus, 2) feed the workset’s rsync file to the Browser, 3) have the Browser download, index, and analyze the corpus, and 4) enable to reader to search, browse, and interact with the result of the analysis. With varying success, I have done this with a number of worksets ranging on topics from literature, philosophy, Rome, and cookery. The best working examples are the ones from Thoreau and Austen. [2, 3] The others are still buggy. As a further example, the Browser can/will create reports describing the corpus as a whole. This analysis includes the size of a corpus measured in pages as well as words, date ranges, word frequencies, and selected items of interest based on pre-set “themes” — usage of color words, name of “great” authors, and a set of timeless ideas. [4] This report is based on more fundamental reports such as frequency tables, a “catalog”, and lists of unique words. [5, 6, 7, 8] The whole thing is written in a combination of shell and Python scripts. It should run on just about any out-of-the-box Linux or Macintosh computer. Take a look at the code. [9] No special libraries needed. (“Famous last words.”) In its current state, it is very Unix-y. Everything is done from the command line. Lot’s of plain text files and the exploitation of STDIN and STDOUT. Like a Renaissance cartoon, the Browser, in its current state, is only a sketch. Only later will a more full-bodied, Web-based interface be created. The next steps are numerous and listed in no priority order: putting the whole thing on GitHub, outputting the reports in generic formats so other things can easily read them, improving the terminal-based search interface, implementing a Web-based search interface, writing advanced programs in R that chart and graph analysis, provide a means for comparing & contrasting two or more items from a corpus, indexing the corpus with a (real) indexer such as Solr, writing a “cookbook” describing how to use the browser to to “kewl” things, making the metadata of corpora available as Linked Data, etc. 'Want to give it a try? For a limited period of time, go to the HathiTrust Research Center Portal, create (refine or identify) a collection of personal interest, use the Algorithms tool to export the collection's rsync file, and send the file to me. I will feed the rsync file to the Browser, and then send you the URL pointing to the results. [10] Let’s see what happens. Fun with public domain content, text mining, and the definition of librarianship. Links [1] HTRC Workset Browser - http://bit.ly/workset-browser [2] Thoreau - http://bit.ly/browser-thoreau [3] Austen - http://bit.ly/browser-austen [4] Thoreau report - http://ntrda.me/1LD3xds [5] Thoreau dictionary (frequency list) - http://bit.ly/thoreau-dictionary [6] usage of color words in Thoreau — http://bit.ly/thoreau-colors [7] unique words in the corpus - http://bit.ly/thoreau-unique [8] Thoreau “catalog” — http://bit.ly/thoreau-catalog [9] source code - http://ntrda.me/1Q8pPoI [10] HathiTrust Research Center - https://sharc.hathitrust.org — Eric Lease Morgan, Librarian University of Notre Dame -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: +1-510-435-8234 skype: kcoylenet/+1-510-984-3600
[CODE4LIB] Director, Southern Regional Library Facility (UCLA)
Code4Lib colleagues - UCLA is looking for a Director of the Southern Regional Library Facility, a high density storage facility that serves the 10 campuses of University of California. In addition to managing the facility and staff, this position is responsible for strategic initiatives in collection management and shared print as well as a robust digitization operation. Speaking as a technology-focused librarian who serves as the Director of the Northern Regional Library Facility I can say that there are a lot of opportunities in this position to explore automation, digital preservation, shared collections and other exciting topics in LIS. If you would like to talk more about the position please feel free to get in touch. Erik -- Erik Mitchell Associate University Librarian Director of Digital Initiatives and Collaborative Services Director, Northern Regional Library Facility University of California, Berkeley emitch...@berkeley.edu http://erikmitchell.info http://www.library.ucla.edu/about/employment-human-resources/staff-positions Under the general direction of the Associate University Librarian (AUL) for Collection Management and Scholarly Communication, the Director of the Southern Regional Library Facility (SRLF) and Collaborative Shared Print Programs is responsible for the leadership, management and operations of the SRLF and for Collaborative Shared Print Programs. The Director manages the UC Southern Regional Library Facility (SRLF), a university-wide academic support program stewarding library materials including special collections, manuscripts, archives, audio-visual collections and content for the five southern campuses and stewarding the materials of the UC Shared Print Archives Program. Responsibilities include the planning for the growth of collaborative shared print activities, positioning the SRLF to play a leadership role in a network of shared print repositories, implementing innovative technical and other service enhancements to improve cross institutional sharing and management of collections and coordinating and overseeing preservation imaging services including large scale digitization and reformatting. SRLF is a large-scale, high density, environmentally controlled collection management facility located on the UCLA campus, with capacity for seven million volume equivalents. It serves the five southern campuses of the University of California: Irvine, Los Angeles, Riverside, San Diego, and Santa Barbara, as well as the northern UC campuses. The SRLF Preservation Imaging Service enables libraries to preserve fragile print materials through microfilm or digital formatting, and to share the resulting images with other libraries and the general public through Internet/Web access to the UCLA Digital Library and/or the California Digital Library, or though the less vulnerable medium of microfilm. The SRLF participates in the UC Shared Print Archive Program, providing storage for the print copy of select journal titles. The print archive programs held at the SRLF have grown to include the JSTOR Archive and UC Shared Print for Licensed Content (with content fully accessible online), and the Western Regional Storage Trust (WEST Archive) that includes 100+ member libraries and more than 400K journal volumes archived across the WEST membership. Applicants will be able to view and apply for this job until the Posting Expiration Date of 06-15-2015. You may view your posting and the applicants that have applied for this position by accessing UCLA ( https://hr.jobs.ucla.edu).
Re: [CODE4LIB] getting started with Drupal for library website
Hi Ken, These tasks are pretty trivial with a custom content type for your databases and Views. I've done the exact setup you mention-database list, both grouped by subject & A to Z-at my former workplace. Here's what the result looks like: http://info.chesapeake.edu/lrc/library/academic-databases The Google Analytics module tracks outbound clicks, it's either by default or a single option in its settings. If you have a rather small number of databases, I think doing this in pure Drupal will pay off in terms of ease and content reusability within the CMS. CUFTS or another ERM system is going to be more robust and suitable for a larger collection. On Wed, May 27, 2015 at 08:17 Mark Jordan wrote: > > There's CUFTS, which is no longer under development as far as I know: > > http://researcher.sfu.ca/cufts > > CUFTS is under active development. Feel free to contact > researcher-supp...@sfu.ca if you'd like more info. > > Mark >
Re: [CODE4LIB] getting started with Drupal for library website
> There's CUFTS, which is no longer under development as far as I know: > http://researcher.sfu.ca/cufts CUFTS is under active development. Feel free to contact researcher-supp...@sfu.ca if you'd like more info. Mark
Re: [CODE4LIB] getting started with Drupal for library website
What you are describing sounds quite a bit like a knowledge base. There a lot of commercial solutions for these types of things, but open source options are a bit more limited. There's CUFTS, which is no longer under development as far as I know: http://researcher.sfu.ca/cufts There's also GOKb, which is under development and worth keeping an eye on: http://gokb.org/preview I've not used either of these products, so unfortunately, I can't vouch for either one. But hopefully this gives you a starting point to work from. Regards, Karl Holten Systems Integration Specialist SWITCH Inc 414-382-6711 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ken Irwin Sent: Wednesday, May 27, 2015 8:02 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] getting started with Drupal for library website Hi folks, Thanks to all who responded a few weeks ago to my inquiry about updating the code on my library's website. Many folks suggested moving to a CMS, and I'm starting to look into that possibility, and particularly Drupal. In doing so, I'm hoping not to re-invent the wheel, and I'm hoping that maybe someone has already designed a basic infrastructure to replace the backbone of our current website, namely. Under our current arrangement we have an interlocking set of databases that performs some basic library functions: There's a database table that lists all of the databases we subscribe to. That database feeds a user interface that: * lists databases * counts click-thrus * routes traffic to our proxy server when appropriate * can list databases by subject area (defined in a table of subject associations) There's also a back-end UI to create subject/database associations, display click-thru stats, generate EZproxy config files based on the table of library databases. Does anyone know of a freely-available set of modules/pages/etc that's already designed to do this sort of thing? In my imagination, lots of libraries would want to basically this same thing, customized to their own particularly needs and maybe we wouldn't each have to start from scratch. Any advice? Thanks Ken
Re: [CODE4LIB] getting started with Drupal for library website
I can't speak to doing this specifically on Drupal, but in terms of measuring clicks I would simplify. We use google analytics and at each place I've been I've just set up some custom events analytics code to record specific types of clicks. Here at the TMC Library we're now recording database clicks with that mechanism. In terms of a database list, I've gone a few routes. When I was at the University of New Mexico, where I had no access to backend databases for most of my tenure, I built an A-Z list in XML that plugged into our junky CMS (Cascade Server). It worked quite well. However, I'm more interested in extracting things like that from a central data node, like serial solutions or intota. Here at TMC we're using intota, and I've built a php script to extract the contents of one of the reports and populate into a MySQL database for capturing that information. At my last library we used serial solutions, and, while I didn't plug that into the website, I did have to build a script that could parse a serial solutions csv file into a google books xml format so that Ex Libris' rather unfortunate Primo tool could make sense of it for discovery purposes. That file, of course, covered individual publications as well as other linked objects. It's available on my github site. Best regards, *Jason Bengtson, MLIS, MA* Innovation Architect *Houston Academy of MedicineThe Texas Medical Center Library* 1133 John Freeman Blvd Houston, TX 77030 http://library.tmc.edu/ www.jasonbengtson.com On Wed, May 27, 2015 at 8:01 AM, Ken Irwin wrote: > Hi folks, > > Thanks to all who responded a few weeks ago to my inquiry about updating > the code on my library's website. Many folks suggested moving to a CMS, and > I'm starting to look into that possibility, and particularly Drupal. > > In doing so, I'm hoping not to re-invent the wheel, and I'm hoping that > maybe someone has already designed a basic infrastructure to replace the > backbone of our current website, namely. > > Under our current arrangement we have an interlocking set of databases > that performs some basic library functions: > > There's a database table that lists all of the databases we subscribe to. > That database feeds a user interface that: > > * lists databases > > * counts click-thrus > > * routes traffic to our proxy server when appropriate > > * can list databases by subject area (defined in a table of > subject associations) > There's also a back-end UI to create subject/database associations, > display click-thru stats, generate EZproxy config files based on the table > of library databases. > > Does anyone know of a freely-available set of modules/pages/etc that's > already designed to do this sort of thing? In my imagination, lots of > libraries would want to basically this same thing, customized to their own > particularly needs and maybe we wouldn't each have to start from scratch. > > Any advice? > > Thanks > Ken >
Re: [CODE4LIB] hathitrust research center workset browser [call for worksets]
On May 26, 2015, at 11:30 AM, Eric Lease Morgan wrote: > In my copious spare time I have hacked together a thing I’m calling the > HathiTrust Research Center Workset Browser, a (fledgling) tool for doing > “distant reading” against corpora from the HathiTrust. [0] > > [0] introductory Workset Browser blog posting - http://ntrda.me/1FUGP2g Help me put the my fledgling Browser through some paces; this is a call for HathiTrust Research Center worksets. For a limited period of time, go to the HathiTrust Research Center Portal, create (refine or identify) a collection of personal interest, use the Algorithms tool to export the collection's rsync file, and send the file to me. [1] I will feed the rsync file to the Browser, and then send you the URL pointing to the results. Let’s see what happens? [1] HathiTrust Research Center Portal - https://sharc.hathitrust.org — Eric Morgan
[CODE4LIB] getting started with Drupal for library website
Hi folks, Thanks to all who responded a few weeks ago to my inquiry about updating the code on my library's website. Many folks suggested moving to a CMS, and I'm starting to look into that possibility, and particularly Drupal. In doing so, I'm hoping not to re-invent the wheel, and I'm hoping that maybe someone has already designed a basic infrastructure to replace the backbone of our current website, namely. Under our current arrangement we have an interlocking set of databases that performs some basic library functions: There's a database table that lists all of the databases we subscribe to. That database feeds a user interface that: * lists databases * counts click-thrus * routes traffic to our proxy server when appropriate * can list databases by subject area (defined in a table of subject associations) There's also a back-end UI to create subject/database associations, display click-thru stats, generate EZproxy config files based on the table of library databases. Does anyone know of a freely-available set of modules/pages/etc that's already designed to do this sort of thing? In my imagination, lots of libraries would want to basically this same thing, customized to their own particularly needs and maybe we wouldn't each have to start from scratch. Any advice? Thanks Ken
[CODE4LIB] Final Call and See you soon for Code4Lib North
Hello All, In about a week the 2015 Code4Lib North meet up will be held at St. Catharines Public Library Downtown branch. There are still a couple of spot available! If you can only make one of the two days that works too. Details on the wiki: http://wiki.code4lib.org/index.php/North#Code4Lib_North:_the_Sixth._St._Catharines_Public_Library.2C_June_4_.26_5.2C_2015 To all those already registered thanks very much for your support. Do consider adding a topic to the wiki and see you next week! Thanks, Tim == Tim Ribaric Acting Head, Library Systems & Technologies Digital Services Librarian Computer Science & Philosophy Liaison Librarian @elibtronic