Re: [CODE4LIB] hathitrust research center workset browser

2015-05-27 Thread Karen Coyle
Eric, what happens if you access this from a non-HT institution? When I 
go to HT I am often unable to download public domain titles because they 
aren't available to members of the general public.


kc

On 5/26/15 8:30 AM, Eric Lease Morgan wrote:

In my copious spare time I have hacked together a thing I’m calling the 
HathiTrust Research Center Workset Browser, a (fledgling) tool for doing 
“distant reading” against corpora from the HathiTrust. [1]

The idea is to: 1) create, refine, or identify a HathiTrust Research Center 
workset of interest — your corpus, 2) feed the workset’s rsync file to the 
Browser, 3) have the Browser download, index, and analyze the corpus, and 4) 
enable to reader to search, browse, and interact with the result of the 
analysis. With varying success, I have done this with a number of worksets 
ranging on topics from literature, philosophy, Rome, and cookery. The best 
working examples are the ones from Thoreau and Austen. [2, 3] The others are 
still buggy.

As a further example, the Browser can/will create reports describing the corpus 
as a whole. This analysis includes the size of a corpus measured in pages as 
well as words, date ranges, word frequencies, and selected items of interest 
based on pre-set “themes” — usage of color words, name of “great” authors, and 
a set of timeless ideas. [4] This report is based on more fundamental reports 
such as frequency tables, a “catalog”, and lists of unique words. [5, 6, 7, 8]

The whole thing is written in a combination of shell and Python scripts. It 
should run on just about any out-of-the-box Linux or Macintosh computer. Take a 
look at the code. [9] No special libraries needed. (“Famous last words.”) In 
its current state, it is very Unix-y. Everything is done from the command line. 
Lot’s of plain text files and the exploitation of STDIN and STDOUT. Like a 
Renaissance cartoon, the Browser, in its current state, is only a sketch. Only 
later will a more full-bodied, Web-based interface be created.

The next steps are numerous and listed in no priority order: putting the whole 
thing on GitHub, outputting the reports in generic formats so other things can 
easily read them, improving the terminal-based search interface, implementing a 
Web-based search interface, writing advanced programs in R that chart and graph 
analysis, provide a means for comparing & contrasting two or more items from a 
corpus, indexing the corpus with a (real) indexer such as Solr, writing a 
“cookbook” describing how to use the browser to to “kewl” things, making the 
metadata of corpora available as Linked Data, etc.

'Want to give it a try? For a limited period of time, go to the HathiTrust 
Research Center Portal, create (refine or identify) a collection of personal 
interest, use the Algorithms tool to export the collection's rsync file, and 
send the file to me. I will feed the rsync file to the Browser, and then send 
you the URL pointing to the results. [10] Let’s see what happens.

Fun with public domain content, text mining, and the definition of 
librarianship.

Links

[1] HTRC Workset Browser - http://bit.ly/workset-browser
[2] Thoreau - http://bit.ly/browser-thoreau
[3] Austen - http://bit.ly/browser-austen
[4] Thoreau report - http://ntrda.me/1LD3xds
[5] Thoreau dictionary (frequency list) - http://bit.ly/thoreau-dictionary
[6] usage of color words in Thoreau — http://bit.ly/thoreau-colors
[7] unique words in the corpus - http://bit.ly/thoreau-unique
[8] Thoreau “catalog” — http://bit.ly/thoreau-catalog
[9] source code - http://ntrda.me/1Q8pPoI
   [10] HathiTrust Research Center - https://sharc.hathitrust.org

—
Eric Lease Morgan, Librarian
University of Notre Dame


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
m: +1-510-435-8234
skype: kcoylenet/+1-510-984-3600


[CODE4LIB] Director, Southern Regional Library Facility (UCLA)

2015-05-27 Thread Erik Mitchell
Code4Lib colleagues -


UCLA is looking for a Director of the Southern Regional Library Facility, a
high density storage facility that serves the 10 campuses of University of
California.  In addition to managing the facility and staff, this position
is responsible for strategic initiatives in collection management and
shared print as well as a robust digitization operation.


Speaking as a technology-focused librarian who serves as the Director of
the Northern Regional Library Facility I can say that there are a lot of
opportunities in this position to explore automation, digital preservation,
shared collections and other exciting topics in LIS.  If you would like to
talk more about the position please feel free to get in touch.


Erik

-- 
Erik Mitchell
Associate University Librarian
Director of Digital Initiatives and Collaborative Services
Director, Northern Regional Library Facility
University of California, Berkeley
emitch...@berkeley.edu
http://erikmitchell.info




http://www.library.ucla.edu/about/employment-human-resources/staff-positions



Under the general direction of the Associate University Librarian (AUL) for
Collection Management and Scholarly Communication, the Director of the
Southern Regional Library Facility (SRLF) and Collaborative Shared Print
Programs is responsible for the leadership, management and operations of
the SRLF and for Collaborative Shared Print Programs. The Director manages
the UC Southern Regional Library Facility (SRLF), a university-wide
academic support program stewarding library materials including special
collections, manuscripts, archives, audio-visual collections and content
for the five southern campuses and stewarding the materials of the UC
Shared Print Archives Program. Responsibilities include the planning for
the growth of collaborative shared print activities, positioning the SRLF
to play a leadership role in a network of shared print repositories,
implementing innovative technical and other service enhancements to improve
cross institutional sharing and management of collections and coordinating
and overseeing preservation imaging services including large scale
digitization and reformatting.

SRLF is a large-scale, high density, environmentally controlled collection
management facility located on the UCLA campus, with capacity for seven
million volume equivalents. It serves the five southern campuses of the
University of California: Irvine, Los Angeles, Riverside, San Diego, and
Santa Barbara, as well as the northern UC campuses.

The SRLF Preservation Imaging Service enables libraries to preserve fragile
print materials through microfilm or digital formatting, and to share the
resulting images with other libraries and the general public through
Internet/Web access to the UCLA Digital Library and/or the California
Digital Library, or though the less vulnerable medium of microfilm.

The SRLF participates in the UC Shared Print Archive Program, providing
storage for the print copy of select journal titles. The print archive
programs held at the SRLF have grown to include the JSTOR Archive and UC
Shared Print for Licensed Content (with content fully accessible online),
and the Western Regional Storage Trust (WEST Archive) that includes 100+
member libraries and more than 400K journal volumes archived across the
WEST membership.

Applicants will be able to view and apply for this job until the Posting
Expiration Date of 06-15-2015. You may view your posting and the applicants
that have applied for this position by accessing UCLA (
https://hr.jobs.ucla.edu).


Re: [CODE4LIB] getting started with Drupal for library website

2015-05-27 Thread Eric Phetteplace
Hi Ken,

These tasks are pretty trivial with a custom content type for your
databases and Views. I've done the exact setup you mention-database list,
both grouped by subject & A to Z-at my former workplace. Here's what the
result looks like: http://info.chesapeake.edu/lrc/library/academic-databases

The Google Analytics module tracks outbound clicks, it's either by default
or a single option in its settings.

If you have a rather small number of databases, I think doing this in pure
Drupal will pay off in terms of ease and content reusability within the
CMS. CUFTS or another ERM system is going to be more robust and suitable
for a larger collection.
On Wed, May 27, 2015 at 08:17 Mark Jordan  wrote:

> > There's CUFTS, which is no longer under development as far as I know:
> > http://researcher.sfu.ca/cufts
>
> CUFTS is under active development. Feel free to contact
> researcher-supp...@sfu.ca if you'd like more info.
>
> Mark
>


Re: [CODE4LIB] getting started with Drupal for library website

2015-05-27 Thread Mark Jordan
> There's CUFTS, which is no longer under development as far as I know:
> http://researcher.sfu.ca/cufts

CUFTS is under active development. Feel free to contact 
researcher-supp...@sfu.ca if you'd like more info. 

Mark 


Re: [CODE4LIB] getting started with Drupal for library website

2015-05-27 Thread Karl Holten
What you are describing sounds quite a bit like a knowledge base. There a lot 
of commercial solutions for these types of things, but open source options are 
a bit more limited. 

There's CUFTS, which is no longer under development as far as I know: 
http://researcher.sfu.ca/cufts

There's also GOKb, which is under development and worth keeping an eye on: 
http://gokb.org/preview

I've not used either of these products, so unfortunately, I can't vouch for 
either one. But hopefully this gives you a starting point to work from.

Regards,
Karl Holten
Systems Integration Specialist
SWITCH Inc
414-382-6711

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ken 
Irwin
Sent: Wednesday, May 27, 2015 8:02 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] getting started with Drupal for library website

Hi folks,

Thanks to all who responded a few weeks ago to my inquiry about updating the 
code on my library's website. Many folks suggested moving to a CMS, and I'm 
starting to look into that possibility, and particularly Drupal.

In doing so, I'm hoping not to re-invent the wheel, and I'm hoping that maybe 
someone has already designed a basic infrastructure to replace the backbone of 
our current website, namely.

Under our current arrangement we have an interlocking set of databases that 
performs some basic library functions:

There's a database table that lists all of the databases we subscribe to. That 
database feeds a user interface that:

* lists databases

* counts click-thrus

* routes traffic to our proxy server when appropriate

* can list databases by subject area (defined in a table of subject 
associations)
There's also a back-end UI to create subject/database associations, display 
click-thru stats, generate EZproxy config files based on the table of library 
databases.

Does anyone know of a freely-available set of modules/pages/etc that's already 
designed to do this sort of thing? In my imagination, lots of libraries would 
want to basically this same thing, customized to their own particularly needs 
and maybe we wouldn't each have to start from scratch.

Any advice?

Thanks
Ken


Re: [CODE4LIB] getting started with Drupal for library website

2015-05-27 Thread Jason Bengtson
I can't speak to doing this specifically on Drupal, but in terms of
measuring clicks I would simplify. We use google analytics and at each
place I've been I've just set up some custom events analytics code to
record specific types of clicks. Here at the TMC Library we're now
recording database clicks with that mechanism. In terms of a database list,
I've gone a few routes. When I was at the University of New Mexico, where I
had no access to backend databases for most of my tenure, I built an A-Z
list in XML that plugged into our junky CMS (Cascade Server). It worked
quite well. However, I'm more interested in extracting things like that
from a central data node, like serial solutions or intota. Here at TMC
we're using intota, and I've built a php script to extract the contents of
one of the reports and populate into a MySQL database for capturing that
information. At my last library we used serial solutions, and, while I
didn't plug that into the website, I did have to build a script that could
parse a serial solutions csv file into a google books xml format so that Ex
Libris' rather unfortunate Primo tool could make sense of it for discovery
purposes. That file, of course, covered individual publications as well as
other linked objects. It's available on my github site.

Best regards,
*Jason Bengtson, MLIS, MA*
Innovation Architect


*Houston Academy of MedicineThe Texas Medical Center Library*
1133 John Freeman Blvd
Houston, TX   77030
http://library.tmc.edu/
www.jasonbengtson.com

On Wed, May 27, 2015 at 8:01 AM, Ken Irwin  wrote:

> Hi folks,
>
> Thanks to all who responded a few weeks ago to my inquiry about updating
> the code on my library's website. Many folks suggested moving to a CMS, and
> I'm starting to look into that possibility, and particularly Drupal.
>
> In doing so, I'm hoping not to re-invent the wheel, and I'm hoping that
> maybe someone has already designed a basic infrastructure to replace the
> backbone of our current website, namely.
>
> Under our current arrangement we have an interlocking set of databases
> that performs some basic library functions:
>
> There's a database table that lists all of the databases we subscribe to.
> That database feeds a user interface that:
>
> * lists databases
>
> * counts click-thrus
>
> * routes traffic to our proxy server when appropriate
>
> * can list databases by subject area (defined in a table of
> subject associations)
> There's also a back-end UI to create subject/database associations,
> display click-thru stats, generate EZproxy config files based on the table
> of library databases.
>
> Does anyone know of a freely-available set of modules/pages/etc that's
> already designed to do this sort of thing? In my imagination, lots of
> libraries would want to basically this same thing, customized to their own
> particularly needs and maybe we wouldn't each have to start from scratch.
>
> Any advice?
>
> Thanks
> Ken
>


Re: [CODE4LIB] hathitrust research center workset browser [call for worksets]

2015-05-27 Thread Eric Lease Morgan
On May 26, 2015, at 11:30 AM, Eric Lease Morgan  wrote:

> In my copious spare time I have hacked together a thing I’m calling the 
> HathiTrust Research Center Workset Browser, a (fledgling) tool for doing 
> “distant reading” against corpora from the HathiTrust. [0]
> 
>   [0] introductory Workset Browser blog posting - http://ntrda.me/1FUGP2g


Help me put the my fledgling Browser through some paces; this is a call for 
HathiTrust Research Center worksets.

For a limited period of time, go to the HathiTrust Research Center Portal, 
create (refine or identify) a collection of personal interest, use the 
Algorithms tool to export the collection's rsync file, and send the file to me. 
[1] I will feed the rsync file to the Browser, and then send you the URL 
pointing to the results. Let’s see what happens?

[1] HathiTrust Research Center Portal - https://sharc.hathitrust.org

—
Eric Morgan


[CODE4LIB] getting started with Drupal for library website

2015-05-27 Thread Ken Irwin
Hi folks,

Thanks to all who responded a few weeks ago to my inquiry about updating the 
code on my library's website. Many folks suggested moving to a CMS, and I'm 
starting to look into that possibility, and particularly Drupal.

In doing so, I'm hoping not to re-invent the wheel, and I'm hoping that maybe 
someone has already designed a basic infrastructure to replace the backbone of 
our current website, namely.

Under our current arrangement we have an interlocking set of databases that 
performs some basic library functions:

There's a database table that lists all of the databases we subscribe to. That 
database feeds a user interface that:

* lists databases

* counts click-thrus

* routes traffic to our proxy server when appropriate

* can list databases by subject area (defined in a table of subject 
associations)
There's also a back-end UI to create subject/database associations, display 
click-thru stats, generate EZproxy config files based on the table of library 
databases.

Does anyone know of a freely-available set of modules/pages/etc that's already 
designed to do this sort of thing? In my imagination, lots of libraries would 
want to basically this same thing, customized to their own particularly needs 
and maybe we wouldn't each have to start from scratch.

Any advice?

Thanks
Ken


[CODE4LIB] Final Call and See you soon for Code4Lib North

2015-05-27 Thread Tim Ribaric
Hello All,

In about a week the 2015 Code4Lib North meet up will be held at St. Catharines 
Public Library Downtown branch. There are still a couple of spot available!  If 
you can only make one of the two days that works too. Details on the wiki:

http://wiki.code4lib.org/index.php/North#Code4Lib_North:_the_Sixth._St._Catharines_Public_Library.2C_June_4_.26_5.2C_2015


To all those already registered thanks very much for your support.  Do consider 
adding a topic to the wiki and see you next week!

Thanks,
Tim



==
Tim Ribaric
Acting Head, Library Systems & Technologies
Digital Services Librarian
Computer Science & Philosophy Liaison Librarian
@elibtronic