Re: [CODE4LIB] hathitrust research center workset browser [github]
I believe I have created a repository of my HTRC Workset Browser code (shell and Python scripts) on GitHub. [1] From the Quick Start section of the README: 1. Download the software putting the bin and etc directories in the same directory. 2. Change to the directory where the bin and etc directories have been saved. 3. Build a collection by issuing the following command: ./bin/build-corpus.sh thoreau etc/rsync-thoreau.sh If all goes well, the Browser will create a new directory named thoreau, rsync a bunch o' JSON files from the HathiTrust to your computer, index the JSON files, do some textual analysis against the corpus, create a simple database ("catalog"), and create a few more reports. You can then peruse the files in the newly created thoreau directory. If this worked, then repeat the process for the other rsync files found in the etc directory. Probably the first issue people will have is the path to their version of Python. (Sigh.) [1] repository - https://github.com/ericleasemorgan/HTRC-Workset-Browser — Eric “Git Ignorant” Morgan
Re: [CODE4LIB] hathitrust research center workset browser
Right. Which is why *someone* copied all of the Google digitized books to the Internet Archive -- someone not associated with the library partners. So generally if you cannot download from HT you can find the same scan via openlibrary.org. Unfortunately that doesn't help with using the tool that ELM has alerted us to. kc On 6/1/15 2:19 PM, Jimmy Ghaphery wrote: I think we are in agreement (especially about the utility of all things HathiTrust). My one point is that any restrictions on digitized public domain works, as I understand it, are not related to copyright. On Mon, Jun 1, 2015 at 5:00 PM, Terry Reese wrote: However, the digitizing agency cannot dictate any copyright restrictions on the digitized copies once released to the public The digital objects have not, and as far as I understand, cannot be made available to the public if digitized as part of the google books digitization project. Most institutions got very limited use, and generally these were tied to their specific, immediate, communities. Though, with that said each institution has slightly different terms. For what it's worth, the research center does not make the digital copies available for download -- it provides tools for working with data in aggregate (worksets) and provides a proof of concept environment demonstrating the feasibility of creating a secured data repository with I believe the long-term goal of providing data mining for the entire hathitrust resources (both within and outside of the public domain). But even as it stands now, the tool has become a fantastic teaching tool when talking to instructors and graduate students looking for large data sets to work with, that also includes some pretty interesting research algori! thms for working with the data. --tr -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jimmy Ghaphery Sent: Monday, June 1, 2015 4:47 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] hathitrust research center workset browser Thanks Eric for posting the webinar in the other thread. I am pretty sure that digitizing something in the public domain does not change its copyright status, at least in the U.S. The digitizing agency certainly has the right to sell, restrict access, watermark, or even keep the scans locked up on a thumb drive in a closet. They are not obligated to share or to provide the digital files in a re-usable format. However, the digitizing agency cannot dictate any copyright restrictions on the digitized copies once released to the public. #iamnotalawyer and welcome correction best, Jimmy On Mon, Jun 1, 2015 at 12:12 PM, Eric Lease Morgan wrote: On Jun 1, 2015, at 10:58 AM, davesgonechina wrote: They just informed me I need a .edu address. Having trouble understanding the use of the term "public domain" here. Gung fhpx, naq fbhaqf ernyyl fbeg bs fghcvq!! --RYZ -- Jimmy Ghaphery Head, Digital Technologies VCU Libraries 804-827-3551 -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: +1-510-435-8234 skype: kcoylenet/+1-510-984-3600
Re: [CODE4LIB] hathitrust research center workset browser
I think we are in agreement (especially about the utility of all things HathiTrust). My one point is that any restrictions on digitized public domain works, as I understand it, are not related to copyright. On Mon, Jun 1, 2015 at 5:00 PM, Terry Reese wrote: > >> However, the digitizing agency cannot dictate any copyright > >>restrictions on the digitized copies once released to the public > > The digital objects have not, and as far as I understand, cannot be made > available to the public if digitized as part of the google books > digitization project. Most institutions got very limited use, and > generally these were tied to their specific, immediate, communities. > Though, with that said each institution has slightly different terms. For > what it's worth, the research center does not make the digital copies > available for download -- it provides tools for working with data in > aggregate (worksets) and provides a proof of concept environment > demonstrating the feasibility of creating a secured data repository with I > believe the long-term goal of providing data mining for the entire > hathitrust resources (both within and outside of the public domain). But > even as it stands now, the tool has become a fantastic teaching tool when > talking to instructors and graduate students looking for large data sets to > work with, that also includes some pretty interesting research algori! > thms for working with the data. > > --tr > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Jimmy Ghaphery > Sent: Monday, June 1, 2015 4:47 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] hathitrust research center workset browser > > Thanks Eric for posting the webinar in the other thread. > > I am pretty sure that digitizing something in the public domain does not > change its copyright status, at least in the U.S. The digitizing agency > certainly has the right to sell, restrict access, watermark, or even keep > the scans locked up on a thumb drive in a closet. They are not obligated to > share or to provide the digital files in a re-usable format. However, the > digitizing agency cannot dictate any copyright restrictions on the > digitized copies once released to the public. > > #iamnotalawyer and welcome correction > > best, > > Jimmy > > > > On Mon, Jun 1, 2015 at 12:12 PM, Eric Lease Morgan wrote: > > > On Jun 1, 2015, at 10:58 AM, davesgonechina > > wrote: > > > > > They just informed me I need a .edu address. Having trouble > > > understanding the use of the term "public domain" here. > > > > Gung fhpx, naq fbhaqf ernyyl fbeg bs fghcvq!! --RYZ > > > > > > -- > Jimmy Ghaphery > Head, Digital Technologies > VCU Libraries > 804-827-3551 > -- Jimmy Ghaphery Head, Digital Technologies VCU Libraries 804-827-3551
Re: [CODE4LIB] hathitrust research center workset browser
>> However, the digitizing agency cannot dictate any copyright >>restrictions on the digitized copies once released to the public The digital objects have not, and as far as I understand, cannot be made available to the public if digitized as part of the google books digitization project. Most institutions got very limited use, and generally these were tied to their specific, immediate, communities. Though, with that said each institution has slightly different terms. For what it's worth, the research center does not make the digital copies available for download -- it provides tools for working with data in aggregate (worksets) and provides a proof of concept environment demonstrating the feasibility of creating a secured data repository with I believe the long-term goal of providing data mining for the entire hathitrust resources (both within and outside of the public domain). But even as it stands now, the tool has become a fantastic teaching tool when talking to instructors and graduate students looking for large data sets to work with, that also includes some pretty interesting research algori! thms for working with the data. --tr -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jimmy Ghaphery Sent: Monday, June 1, 2015 4:47 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] hathitrust research center workset browser Thanks Eric for posting the webinar in the other thread. I am pretty sure that digitizing something in the public domain does not change its copyright status, at least in the U.S. The digitizing agency certainly has the right to sell, restrict access, watermark, or even keep the scans locked up on a thumb drive in a closet. They are not obligated to share or to provide the digital files in a re-usable format. However, the digitizing agency cannot dictate any copyright restrictions on the digitized copies once released to the public. #iamnotalawyer and welcome correction best, Jimmy On Mon, Jun 1, 2015 at 12:12 PM, Eric Lease Morgan wrote: > On Jun 1, 2015, at 10:58 AM, davesgonechina > wrote: > > > They just informed me I need a .edu address. Having trouble > > understanding the use of the term "public domain" here. > > Gung fhpx, naq fbhaqf ernyyl fbeg bs fghcvq!! --RYZ > -- Jimmy Ghaphery Head, Digital Technologies VCU Libraries 804-827-3551
Re: [CODE4LIB] hathitrust research center workset browser
Thanks Eric for posting the webinar in the other thread. I am pretty sure that digitizing something in the public domain does not change its copyright status, at least in the U.S. The digitizing agency certainly has the right to sell, restrict access, watermark, or even keep the scans locked up on a thumb drive in a closet. They are not obligated to share or to provide the digital files in a re-usable format. However, the digitizing agency cannot dictate any copyright restrictions on the digitized copies once released to the public. #iamnotalawyer and welcome correction best, Jimmy On Mon, Jun 1, 2015 at 12:12 PM, Eric Lease Morgan wrote: > On Jun 1, 2015, at 10:58 AM, davesgonechina > wrote: > > > They just informed me I need a .edu address. Having trouble understanding > > the use of the term "public domain" here. > > Gung fhpx, naq fbhaqf ernyyl fbeg bs fghcvq!! --RYZ > -- Jimmy Ghaphery Head, Digital Technologies VCU Libraries 804-827-3551
Re: [CODE4LIB] hathitrust research center workset browser
On Jun 1, 2015, at 10:58 AM, davesgonechina wrote: > They just informed me I need a .edu address. Having trouble understanding > the use of the term "public domain" here. Gung fhpx, naq fbhaqf ernyyl fbeg bs fghcvq!! --RYZ
Re: [CODE4LIB] hathitrust research center workset browser
I know that Robert McDonald lurks around here -- so he could clarify this -- but what folks need to realize here is that the research center is providing tools that allow research access to materials within the hathitrust that are within the public domain. However, the digitized materials themselves, are not public domain any more (as I understand it). These materials, as I understand, are governed by the agreements institutions made as part of the google project. So, while the materials that the research center is currently providing access to are ones identified as within the public domain, access to the research center is curated due to those agreements. Robert or someone else can clarify if I've misspoken based on my understanding here. --tr -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of davesgonechina Sent: Monday, June 1, 2015 10:58 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] hathitrust research center workset browser They just informed me I need a .edu address. Having trouble understanding the use of the term "public domain" here. On Mon, Jun 1, 2015, 9:58 PM Eric Lease Morgan wrote: > On Jun 1, 2015, at 4:33 AM, davesgonechina > wrote: > > > If your *institutional* email address is not on their whitelist (not > > sure if it is limited to subscribing ones, they don't say) you > > cannot register using the signup form, instead you can only request > > an account by briefly explaining why you want one. Weird, because > > they'd have potentially > learned > > more about me if they just let me put my gmail address in the signup > form. > > > > I don't get it - can all users download public domain content? If > > they > give > > me an account, will I be indistinguishable from a subscribing > institution? > > If not, why the extra hoops? > > > Dave, you are the second person to bring this “white listing” issue to > my attention. Bummer! Yes, apparently, unless your email address is a > part of wider something or another, then you need to be authorized to > use the Research Center. Weird! In my opinion, while the Research > Center’s tools work, I believe the site suffers from usability issues. > > In any event, I have enhanced the auto-generated reports created by my > “Browser”, and while they are very textual, I also believe they are > insightful. For example, the complete works of: > > * William Ellery Channing - http://bit.ly/browser-channing-about > * Jane Austen - http://bit.ly/browser-austen-about > * Ralph Waldo Emerson - http://bit.ly/browser-emerson-about > * Henry David Thoreau - http://bit.ly/browser-thoreau-about > > — > Eric “Beginning To Suffer From ‘Creeping Featuritis’” Morgan >
Re: [CODE4LIB] hathitrust research center workset browser
They just informed me I need a .edu address. Having trouble understanding the use of the term "public domain" here. On Mon, Jun 1, 2015, 9:58 PM Eric Lease Morgan wrote: > On Jun 1, 2015, at 4:33 AM, davesgonechina > wrote: > > > If your *institutional* email address is not on their whitelist (not sure > > if it is limited to subscribing ones, they don't say) you cannot register > > using the signup form, instead you can only request an account by briefly > > explaining why you want one. Weird, because they'd have potentially > learned > > more about me if they just let me put my gmail address in the signup > form. > > > > I don't get it - can all users download public domain content? If they > give > > me an account, will I be indistinguishable from a subscribing > institution? > > If not, why the extra hoops? > > > Dave, you are the second person to bring this “white listing” issue to my > attention. Bummer! Yes, apparently, unless your email address is a part of > wider something or another, then you need to be authorized to use the > Research Center. Weird! In my opinion, while the Research Center’s tools > work, I believe the site suffers from usability issues. > > In any event, I have enhanced the auto-generated reports created by my > “Browser”, and while they are very textual, I also believe they are > insightful. For example, the complete works of: > > * William Ellery Channing - http://bit.ly/browser-channing-about > * Jane Austen - http://bit.ly/browser-austen-about > * Ralph Waldo Emerson - http://bit.ly/browser-emerson-about > * Henry David Thoreau - http://bit.ly/browser-thoreau-about > > — > Eric “Beginning To Suffer From ‘Creeping Featuritis’” Morgan >
Re: [CODE4LIB] hathitrust research center workset browser
On Jun 1, 2015, at 4:33 AM, davesgonechina wrote: > If your *institutional* email address is not on their whitelist (not sure > if it is limited to subscribing ones, they don't say) you cannot register > using the signup form, instead you can only request an account by briefly > explaining why you want one. Weird, because they'd have potentially learned > more about me if they just let me put my gmail address in the signup form. > > I don't get it - can all users download public domain content? If they give > me an account, will I be indistinguishable from a subscribing institution? > If not, why the extra hoops? Dave, you are the second person to bring this “white listing” issue to my attention. Bummer! Yes, apparently, unless your email address is a part of wider something or another, then you need to be authorized to use the Research Center. Weird! In my opinion, while the Research Center’s tools work, I believe the site suffers from usability issues. In any event, I have enhanced the auto-generated reports created by my “Browser”, and while they are very textual, I also believe they are insightful. For example, the complete works of: * William Ellery Channing - http://bit.ly/browser-channing-about * Jane Austen - http://bit.ly/browser-austen-about * Ralph Waldo Emerson - http://bit.ly/browser-emerson-about * Henry David Thoreau - http://bit.ly/browser-thoreau-about — Eric “Beginning To Suffer From ‘Creeping Featuritis’” Morgan
Re: [CODE4LIB] hathitrust research center workset browser
If your *institutional* email address is not on their whitelist (not sure if it is limited to subscribing ones, they don't say) you cannot register using the signup form, instead you can only request an account by briefly explaining why you want one. Weird, because they'd have potentially learned more about me if they just let me put my gmail address in the signup form. I don't get it - can all users download public domain content? If they give me an account, will I be indistinguishable from a subscribing institution? If not, why the extra hoops? On Fri, May 29, 2015 at 1:51 AM, Eric Lease Morgan wrote: > On May 27, 2015, at 6:33 PM, Karen Coyle wrote: > > >> In my copious spare time I have hacked together a thing I’m calling the > HathiTrust Research Center Workset Browser, a (fledgling) tool for doing > “distant reading” against corpora from the HathiTrust. [0, 1] ... > >> > >> 'Want to give it a try? For a limited period of time, go to the > HathiTrust Research Center Portal, create (refine or identify) a collection > of personal interest, use the Algorithms tool to export the collection's > rsync file, and send the file to me. I will feed the rsync file to the > Browser, and then send you the URL pointing to the results. > >> > >> [0] introduction in a blog posting - http://ntrda.me/1FUGP2g > >> [1] HTRC Workset Browser - http://bit.ly/workset-browser > > > > Eric, what happens if you access this from a non-HT institution? When I > go to HT I am often unable to download public domain titles because they > aren't available to members of the general public. > > > The short answer is, “Nothing”. > > The long answer is… longer. The HathiTrust proper is accessible to > anybody, but the downloading of public domain content is only available to > subscribing institutions. > > On the other hand, the “Workset Browser” is designed to work off the > HathiTrust Research Center Portal, not the HathiTrust proper. The Portal is > located at http://sharc.hathitrust.org From there anybody can search the > collection of public domain content, create collections, and apply various > algorithms against collections. One of the algorithms is “create RSYNC > file” which, in turn, allows you to download bunches o’ metadata describing > the items in your collection. (There is also a “download as MARC” > algorithm.) This rsync file is the root of the Workset Browser. Feed the > Browser a rsync file, and the Browser will mirror content locally, index > it, and generate reports describing the collection. > > Thank you for asking. Many people do not know there is a HathiTrust > Research Center. > > — > Eric Morgan >
Re: [CODE4LIB] hathitrust research center workset browser
On May 27, 2015, at 6:33 PM, Karen Coyle wrote: >> In my copious spare time I have hacked together a thing I’m calling the >> HathiTrust Research Center Workset Browser, a (fledgling) tool for doing >> “distant reading” against corpora from the HathiTrust. [0, 1] ... >> >> 'Want to give it a try? For a limited period of time, go to the HathiTrust >> Research Center Portal, create (refine or identify) a collection of personal >> interest, use the Algorithms tool to export the collection's rsync file, and >> send the file to me. I will feed the rsync file to the Browser, and then >> send you the URL pointing to the results. >> >> [0] introduction in a blog posting - http://ntrda.me/1FUGP2g >> [1] HTRC Workset Browser - http://bit.ly/workset-browser > > Eric, what happens if you access this from a non-HT institution? When I go to > HT I am often unable to download public domain titles because they aren't > available to members of the general public. The short answer is, “Nothing”. The long answer is… longer. The HathiTrust proper is accessible to anybody, but the downloading of public domain content is only available to subscribing institutions. On the other hand, the “Workset Browser” is designed to work off the HathiTrust Research Center Portal, not the HathiTrust proper. The Portal is located at http://sharc.hathitrust.org From there anybody can search the collection of public domain content, create collections, and apply various algorithms against collections. One of the algorithms is “create RSYNC file” which, in turn, allows you to download bunches o’ metadata describing the items in your collection. (There is also a “download as MARC” algorithm.) This rsync file is the root of the Workset Browser. Feed the Browser a rsync file, and the Browser will mirror content locally, index it, and generate reports describing the collection. Thank you for asking. Many people do not know there is a HathiTrust Research Center. — Eric Morgan
Re: [CODE4LIB] hathitrust research center workset browser
Eric, what happens if you access this from a non-HT institution? When I go to HT I am often unable to download public domain titles because they aren't available to members of the general public. kc On 5/26/15 8:30 AM, Eric Lease Morgan wrote: In my copious spare time I have hacked together a thing I’m calling the HathiTrust Research Center Workset Browser, a (fledgling) tool for doing “distant reading” against corpora from the HathiTrust. [1] The idea is to: 1) create, refine, or identify a HathiTrust Research Center workset of interest — your corpus, 2) feed the workset’s rsync file to the Browser, 3) have the Browser download, index, and analyze the corpus, and 4) enable to reader to search, browse, and interact with the result of the analysis. With varying success, I have done this with a number of worksets ranging on topics from literature, philosophy, Rome, and cookery. The best working examples are the ones from Thoreau and Austen. [2, 3] The others are still buggy. As a further example, the Browser can/will create reports describing the corpus as a whole. This analysis includes the size of a corpus measured in pages as well as words, date ranges, word frequencies, and selected items of interest based on pre-set “themes” — usage of color words, name of “great” authors, and a set of timeless ideas. [4] This report is based on more fundamental reports such as frequency tables, a “catalog”, and lists of unique words. [5, 6, 7, 8] The whole thing is written in a combination of shell and Python scripts. It should run on just about any out-of-the-box Linux or Macintosh computer. Take a look at the code. [9] No special libraries needed. (“Famous last words.”) In its current state, it is very Unix-y. Everything is done from the command line. Lot’s of plain text files and the exploitation of STDIN and STDOUT. Like a Renaissance cartoon, the Browser, in its current state, is only a sketch. Only later will a more full-bodied, Web-based interface be created. The next steps are numerous and listed in no priority order: putting the whole thing on GitHub, outputting the reports in generic formats so other things can easily read them, improving the terminal-based search interface, implementing a Web-based search interface, writing advanced programs in R that chart and graph analysis, provide a means for comparing & contrasting two or more items from a corpus, indexing the corpus with a (real) indexer such as Solr, writing a “cookbook” describing how to use the browser to to “kewl” things, making the metadata of corpora available as Linked Data, etc. 'Want to give it a try? For a limited period of time, go to the HathiTrust Research Center Portal, create (refine or identify) a collection of personal interest, use the Algorithms tool to export the collection's rsync file, and send the file to me. I will feed the rsync file to the Browser, and then send you the URL pointing to the results. [10] Let’s see what happens. Fun with public domain content, text mining, and the definition of librarianship. Links [1] HTRC Workset Browser - http://bit.ly/workset-browser [2] Thoreau - http://bit.ly/browser-thoreau [3] Austen - http://bit.ly/browser-austen [4] Thoreau report - http://ntrda.me/1LD3xds [5] Thoreau dictionary (frequency list) - http://bit.ly/thoreau-dictionary [6] usage of color words in Thoreau — http://bit.ly/thoreau-colors [7] unique words in the corpus - http://bit.ly/thoreau-unique [8] Thoreau “catalog” — http://bit.ly/thoreau-catalog [9] source code - http://ntrda.me/1Q8pPoI [10] HathiTrust Research Center - https://sharc.hathitrust.org — Eric Lease Morgan, Librarian University of Notre Dame -- Karen Coyle kco...@kcoyle.net http://kcoyle.net m: +1-510-435-8234 skype: kcoylenet/+1-510-984-3600
Re: [CODE4LIB] hathitrust research center workset browser [call for worksets]
On May 26, 2015, at 11:30 AM, Eric Lease Morgan wrote: > In my copious spare time I have hacked together a thing I’m calling the > HathiTrust Research Center Workset Browser, a (fledgling) tool for doing > “distant reading” against corpora from the HathiTrust. [0] > > [0] introductory Workset Browser blog posting - http://ntrda.me/1FUGP2g Help me put the my fledgling Browser through some paces; this is a call for HathiTrust Research Center worksets. For a limited period of time, go to the HathiTrust Research Center Portal, create (refine or identify) a collection of personal interest, use the Algorithms tool to export the collection's rsync file, and send the file to me. [1] I will feed the rsync file to the Browser, and then send you the URL pointing to the results. Let’s see what happens? [1] HathiTrust Research Center Portal - https://sharc.hathitrust.org — Eric Morgan
[CODE4LIB] hathitrust research center workset browser
In my copious spare time I have hacked together a thing I’m calling the HathiTrust Research Center Workset Browser, a (fledgling) tool for doing “distant reading” against corpora from the HathiTrust. [1] The idea is to: 1) create, refine, or identify a HathiTrust Research Center workset of interest — your corpus, 2) feed the workset’s rsync file to the Browser, 3) have the Browser download, index, and analyze the corpus, and 4) enable to reader to search, browse, and interact with the result of the analysis. With varying success, I have done this with a number of worksets ranging on topics from literature, philosophy, Rome, and cookery. The best working examples are the ones from Thoreau and Austen. [2, 3] The others are still buggy. As a further example, the Browser can/will create reports describing the corpus as a whole. This analysis includes the size of a corpus measured in pages as well as words, date ranges, word frequencies, and selected items of interest based on pre-set “themes” — usage of color words, name of “great” authors, and a set of timeless ideas. [4] This report is based on more fundamental reports such as frequency tables, a “catalog”, and lists of unique words. [5, 6, 7, 8] The whole thing is written in a combination of shell and Python scripts. It should run on just about any out-of-the-box Linux or Macintosh computer. Take a look at the code. [9] No special libraries needed. (“Famous last words.”) In its current state, it is very Unix-y. Everything is done from the command line. Lot’s of plain text files and the exploitation of STDIN and STDOUT. Like a Renaissance cartoon, the Browser, in its current state, is only a sketch. Only later will a more full-bodied, Web-based interface be created. The next steps are numerous and listed in no priority order: putting the whole thing on GitHub, outputting the reports in generic formats so other things can easily read them, improving the terminal-based search interface, implementing a Web-based search interface, writing advanced programs in R that chart and graph analysis, provide a means for comparing & contrasting two or more items from a corpus, indexing the corpus with a (real) indexer such as Solr, writing a “cookbook” describing how to use the browser to to “kewl” things, making the metadata of corpora available as Linked Data, etc. 'Want to give it a try? For a limited period of time, go to the HathiTrust Research Center Portal, create (refine or identify) a collection of personal interest, use the Algorithms tool to export the collection's rsync file, and send the file to me. I will feed the rsync file to the Browser, and then send you the URL pointing to the results. [10] Let’s see what happens. Fun with public domain content, text mining, and the definition of librarianship. Links [1] HTRC Workset Browser - http://bit.ly/workset-browser [2] Thoreau - http://bit.ly/browser-thoreau [3] Austen - http://bit.ly/browser-austen [4] Thoreau report - http://ntrda.me/1LD3xds [5] Thoreau dictionary (frequency list) - http://bit.ly/thoreau-dictionary [6] usage of color words in Thoreau — http://bit.ly/thoreau-colors [7] unique words in the corpus - http://bit.ly/thoreau-unique [8] Thoreau “catalog” — http://bit.ly/thoreau-catalog [9] source code - http://ntrda.me/1Q8pPoI [10] HathiTrust Research Center - https://sharc.hathitrust.org — Eric Lease Morgan, Librarian University of Notre Dame