Re: [CODE4LIB] hathtitrust

2018-01-29 Thread Tom Hutchinson
Hi, Eric –

I haven't worked with HathiTrust but I have done some de-duping projects.

How would you do one by hand?

My workflow for deduping has been:
-Export a big batch of MARC records
-Load the file into a program
-Have the program process the records one at a time
-For each record, load them into a data structure. Organize the data
structure so that duplicate records are all stored together. This may
require additional logic.
-Once all the records are loaded into the data structure, go through
the data structure and process each cluster of duplicates. Pick one
record from each cluster and write it to a new data structure used for
output.
-Go through the output data structure and write those records out to a file

I use Java and Marc4J.

For one project the records had a common field I could use as an
identifier. I put them into a HashMap using that identifier as a key.

For another project, I put them into an actual database. I think it
was Derby. Sqlite is also a good db for the relatively small number of
records libraries commonly work with (100k's to millions). I still
used the DB as a relatively simple map. Being able to filter and
perform additional processing steps with SQL was helpful.

OCLC Classify API can also be thrown into the mix:
http://classify.oclc.org/classify2/Classify?oclc=6741810=true
http://classify.oclc.org/classify2/Classify?oclc=6741810=false

Apologies if this info is too rudimentary for where you are starting
from. If it's not rudimentary enough, I'd be happy to write a simple
Java script that could be used as a starting point.

Regards,

Tom

On Thu, Jan 25, 2018 at 9:54 AM, Eric Lease Morgan  wrote:
> Working with the HathiTrust Research Center data can be fun, and I sincerely 
> believe it is an under-utilized system, but creating collections sans 
> duplicates is difficult. Has anybody here figured out a “kewl” way to remove 
> duplicates.
>
> Creating HathiTrust collections is easy: do search, select items of interest, 
> and repeat until tired. One can then download a CSV file describing the 
> collection, but upon closer inspection MANY of the titles are repeated. I 
> know why this has happened, alas, but how might I 
> automatically/programmatically resolve this issue? I’ve begun experimenting 
> with OpenRefine. Does anybody else have other suggestions?
>
> —
> Eric Morgan


[CODE4LIB] JOB POSTING - Collection Management Librarian Assistant or Associate Professor, University Library

2018-01-29 Thread Myung-Ja Han
*Apologies for the cross-posting.


*Collection Management Librarian*

*Assistant or Associate Professor, University Library*

*Collection Management Services, University of Illinois Library at
Urbana-Champaign*



*Position Available:  *As soon as possible. This is a 100%-time,
twelve-month, tenure-system appointment. Rank will be appropriate to the
qualifications of the candidate selected, at a level of either Assistant
Professor or Associate Professor.



*Duties and Responsibilities:  *The University of Illinois at
Urbana-Champaign Library  seeks an innovative,
collaborative, and access-oriented librarian for the position of Collection
Management Librarian. This position requires a professional with
outstanding communication, interpersonal, and facilitation skills as well
as a demonstrated ability to train and support staff through significant
change. As part of the public face of Collection Management Services, the
Collection Management Librarian will build strong relationships with the
staff of the unit, library and campus stakeholders, and the professional
community, influencing library standards and the future of technical
services. The Collection Management Librarian will work with all aspects of
the collection held in the Oak Street Library, including cataloging,
processing, ingest, ongoing collection management, and access.

A thorough knowledge of current trends and industry best practices,
data-driven decision-making and collegial problem-solving skills,
flexibility, and the ability to manage complex projects and multiple and
competing priorities are requirements of this position.



*Position Description:  *Reporting to the Head, Collection Management
Services, the Collection Management Librarian is responsible for providing
management and implementing and, if needed, developing best practices,
maintaining collegial and collaborative relationships with other units in
the Library, the campus and University, external partners, and the
profession. The Collection Management Librarian works to provide
exceptional collection management of the largest physical collection held
within the University Library, foster and promote collaborative
relationships with external partners, and manage unit staff in relation to
these responsibilities. In addition, the Collection Management Librarian
assists with the data collection, analysis and reporting needs for the unit
collections, services, and facility. The Collection Management Librarian
tracks trends, best practices, and initiatives in technical services work
relating to high density storage and uses that knowledge to help inform
Library collection management in the context of the Library’s Framework for
Strategic Action < https://www.library.illinois.edu/geninfo/libraryinit/
framework_for_strategic_action/>.



*The successful candidate will:*

·Exercise critical and independent judgment in assessing collection
and processing needs, in consultation with appropriate subject specialists;

·Create workflows for ingesting, cataloging, and  processing
materials according to local and professional standards;

·Develop best practices in high density storage, collection
management, and consortial collections;

·Contribute to Library-wide, consortial, and external partner
initiatives such as the Big Ten Academic Alliance Shared Print Repository,
the HathiTrust Print Repository, and the Big Ten Academic Alliance Google
Book Search Project, working extensively with partners to help set project
goals and parameters;

·Oversee the ongoing operations of the storage vaults;

·Prepare regular and on-demand reports, including contributions to
unit annual reports, budget requests, and assessment and evaluation of
services, spaces, and collections;

·Supervise staff, hourly employees, and student assistants as
necessary in relation to high density storage, including hiring and
evaluating team members;

·Foster a positive and collaborative work culture;

·Conduct training and provide documentation for collection
processes at the Oak Street Library;

·Promote and market Library services and collections related to
high density storage;

·Collaborate with faculty and staff in the Library to assess high
density storage needs;

·Coordinate workflows with consortial and external partners;

·Work closely with the Oak Street Library public service desk to
ensure timely access to materials held in the Oak Street Library;

·Contribute to the national and international reputation of the
University Library through professional research, service, and
collaboration with appropriate colleagues and organizations.



*Qualifications*

*Required:*

·ALA-accredited Master’s degree or equivalent;

·Minimum of three years of supervisory experience with demonstrated
ability to train and mentor staff;

·Minimum of three years of supervisory and/or project 

Re: [CODE4LIB] Finalized Duty Officer List for Code4Lib 2018

2018-01-29 Thread Andromeda Yelton
Thank you to everyone who's willing to take this on! You're super great and
you make c4l a better place.

On Thu, Jan 25, 2018 at 5:25 PM, Becky Yoose  wrote:

> Hello everyone,
>
> Apologies for the delay! We have a finalized list of Duty Officers for the
> Code4Lib 2018 conference:
>
> On-site Duty Officers:
>
>- Shaun Ellis
>- Mark Matienzo
>- Galen Charlton
>- Christie Peterson
>- Chad Nelson
>- Bobbi Fox
>- Bethany Nowviskie
>- Becky Yoose
>
> Online Duty Officers:
>
>- Becca Quon
>- Jill Locascio
>- Katherine Kim
>- Josh Hutchinson
>- Karen Coyle
>
> More information about how to contact Duty Officers during the conference,
> as well as Duty Officer schedules, will be posted to the conference site
> shortly. Stay tuned...
>
> Thank you,
> Becky
>



-- 
Andromeda Yelton
Senior Software Engineer, MIT Libraries: https://libraries.mit.edu/
President, Library & Information Technology Association: http://www.lita.org
http://andromedayelton.com
@ThatAndromeda 


[CODE4LIB] Job: Software Engineer at Emory University

2018-01-29 Thread Code4Lib Jobs
Reporting to the Lead Software Engineer in the Library Technology and Digital 
Strategies department, the Software Engineer helps provide strategic direction 
for the enhancement, extension, and integration of Emory University Libraries' 
technology and tools in support of faculty and student teaching, learning, 
research, and scholarship.

This position:

Provides software development support including the identification and 
generation of software specifications and the designing, development, 
implementation and revision of software applications to meet business needs.
Supports software applications and associated operating systems.
Reviews software specifications, systems, models and coding using analytical 
and investigative methods and techniques to ensure required specifications meet 
system needs.
Participates in software testing and subsequent modifications.
Writes and edits reports to provide recommendations, conclusions and other data.
Performs related responsibilities as required.

Emory is an Equal Opportunity/Affirmative Action Employer that welcomes and 
encourages diversity and seeks applications and nominations from women and 
minorities.



Brought to you by code4lib jobs: 
https://jobs.code4lib.org/jobs/28030-software-engineer


[CODE4LIB] Job: Discovery & Delivery Systems Analyst at Hamilton College

2018-01-29 Thread Code4Lib Jobs
Hamilton College and its Library and Information Technology Services (LITS) 
division are trailblazers in supporting teaching, learning and faculty/student 
scholarship. In addition to the support provided for traditional library and IT 
services, LITS staff are actively engaged in efforts to support digital 
scholarship in innovative ways, including through the internationally 
recognized Digital Humanities Initiative (DHi). We are currently a part of the 
Building the Campus of the Future: 3D Technologies in Academe EDUCAUSE/HP 
research project, and are a member of national and regional consortia including 
the Oberlin Group, ConnectNY, the Consortium of Liberal Arts Colleges, and the 
Council on Library and Information Resources. LITS and Hamilton College have a 
strong commitment to the ongoing professional development of its employees, 
encouraging them to seek opportunities to expand and strengthen their skills.

We seek a collaborative, creative, strategic thinker to join LITS in the role 
of Discovery and Delivery Systems Librarian to help support our ambitious 
initiatives. The incumbent will work alongside diverse faculty, student, 
librarian, and IT colleagues across campus and within open-source communities 
on projects that advance research, teaching, and learning.  

Reporting to the Director of Metadata and Digital Strategies, the Discovery and 
Delivery Systems Librarian serves as a technical expert related to library 
systems software, standards and infrastructure. The position demands solid 
knowledge of core library systems and collections workflows, as well as the 
ability to think strategically about how to evolve systems to meet emerging 
needs. The incumbent will manage our discovery and delivery systems as well as 
collaborate on digital humanities projects. This is the perfect professional 
opportunity for someone who would enjoy partnering with others to help guide 
Hamilton’s digital future and enhance scholars’ ability to pursue new avenues 
for learning and research.

Responsibilities:

Broadly responsible for the development and administration of current and 
emerging systems implementations, with a focus on how they integrate within and 
beyond the library, and how they are used by the diverse community of 
researchers, both at Hamilton College and beyond our campus.

Specific responsibilities include:


Maintain and enhance library catalog, discovery, and related systems. 
(Currently includes Alma, Primo, ArchivesSpace and more.)


Integrate library systems with campus and consortial systems. (e.g. Blackboard, 
LibGuides, RefWorks, InterLibrary Loan)


Maintain digital collections software. In conjunction with the DHi Lead 
Designer  Software Engineer, Library Information Systems Specialist, and 
Unix/HPC System Administrator, responsible for maintenance and configuration of 
digital collections software. (Currently CONTENTdm, Islandora (Fedora / 
Drupal), Omeka, eXist.)


Ensure library systems and patron data are aligned with college policies. (e.g. 
security, preferred name)


Contribute significantly to interface design. Working closely with the DHi Lead 
Designer  Software Engineer, Library Information Systems Specialist, and 
various stakeholders, incumbent contributes to development and implementation 
of clear (technically advanced, reflective of collection specific subject 
idiosyncrasies, yet integrated across collections) interface designs to digital 
collections and library resources.


Enhance services integration. Working closely with members of LITS, incumbent 
ensures interface designs are reflective of, and enhance, services offered by 
LITS.


Implement digital collection preservation systems.


QUALIFICATIONS


A Master of Library Science degree from an ALA accredited institution or 
equivalent knowledge gained through education and work experience is required.


Experience with the Alma LSP is desired.


Experience with digital collections software is desired, experience with 
Islandora is a plus.


Experience working in an academic library is desired, an academic library in a 
small liberal arts college is a plus.


Demonstrated commitment to building and supporting diversity.




Brought to you by code4lib jobs: 
https://jobs.code4lib.org/jobs/28029-discovery-delivery-systems-analyst


[CODE4LIB] Deadline Extended: Code4Lib 2019 Conference Host Proposals Due Feb. 23rd

2018-01-29 Thread Peggy Griesinger
As no proposals were received by the due date, we are extending the call 
through the conference until February 23rd. If you're considering hosting but 
have some reservations or questions, members of the 2018 Local Planning 
Committee will be happy to meet with prospective hosts during the conference to 
chat in person about organizing a C4L conference. You can also still email me 
with questions at peggygriesinger at gmail dot com.

**

The Code4Lib community is calling for proposals to host the fourteenth annual 
Code4Lib Conference in 2019.

Prior to submitting a proposal we recommend reviewing the conference hosting 
web page [0] and How To Plan a Code4LibCon on the wiki [1] to learn more about 
the kind of venue the community seeks and the responsibilities involved with 
hosting the conference.  In particular, we would like to point out that in 
recent years, the total cost of hosting the conference has been in the low to 
middle-low six figures.

The deadline for proposals is midnight EST on Friday, February 23rd, 2018. The 
decision will be made via online popular vote. Voting dates are TBD based on 
receipt of proposals.

You can apply by making your pitch to the Code4Lib Conference Planning list 
[2]; the main Code4Lib mailing list [3]; and by linking to your proposal on the 
2019 Hosting Proposals wiki page [4]; attention to the criteria listed on the 
conference hosting page is appreciated. Good luck!

A number of people who have helped organize past conferences are on the main 
mailing list, Slack, and IRC.  In addition, members of the Local Planning 
Committee for the 2018 Conference will also be available to potential bidders 
to answer questions; if you wish to get in touch or set up a call, please 
contact Peggy Griesinger at peggygriesinger at gmail dot com.

Here's a sample of past successful cities

* 2018 
https://wiki.code4lib.org/2018_Hosting_Proposals#Washington_DC_2018_Code4Lib_Proposal
* 2016 https://c4l-phl.github.io/
* 2015 http://osulp.github.io/code4lib-pdx/
* 2014 
https://docs.google.com/document/d/1amxzn4xs26ILszZek5nIEEfd4qHNfLjp1BAc5CU5YKw/edit
* 2012 https://sites.google.com/site/code4lib2012seattle/

[0] https://code4lib.org/conference/hosting
[1] https://wiki.code4lib.org/How_To_Plan_A_Code4LibCon
[2] http://groups.google.com/group/code4libcon
[3] https://lists.clir.org/cgi-bin/wa?A0=CODE4LIB
[4] https://wiki.code4lib.org/2019_Hosting_Proposals

Regards,
Code4Lib 2019 Host Voting Committee


Re: [CODE4LIB] Seeking for a list of popular public libraries' urls in North America

2018-01-29 Thread Chad Nelson
Will,

There are also the IMLS public library data sets that might be useful.

https://www.imls.gov/research-evaluation/data-collection/public-libraries-survey/explore-pls-data

Chad

On Mon, Jan 29, 2018, 9:07 AM Becky Yoose  wrote:

> Hello Will,
>
> If you haven't checked it out already, Marshall Breeding's site has a
> section listing public libraries at
> https://librarytechnology.org/libraries/uspublic/. The site doesn't rank
> by
> popularity, but each public library's page has some basic statistics, such
> as service population, collection size, and circulation.
>
> Cheers,
> Becky
>
>
> --
> Becky Yoose
> Library Applications and Systems Manager
> The Seattle Public Library
>
> On Sun, Jan 28, 2018 at 2:43 PM, Will Skora  wrote:
>
> > Hi,
> >
> >
> > I was writing to ask if a list of the most popular public libraries' urls
> > (North American preferably) exists and if so, if it can be shared. I
> don't
> > have a specific metric to define 'most popular' (like # of patrons
> served,
> > items in circulation, etc); instead, I'm just trying to informally
> compare
> > public libraries' websites on a couple characteristics and do this
> without
> > having to find all of their URLs.
> >
> >
> > Regards,
> >
> > Will Skora
> >
> > Web Administrator
> >
> > Cleveland Public Library
> >
> > (https://cpl.org)
> >
> > Cleveland Public Library – The People's University
> > cpl.org
> > Cleveland Public Library houses the largest chess publication library in
> > the world. It includes any type of chess-related document, manuscript,
> > photograph, object, or ...
> >
> >
> >
> > NOTICE: This e-mail message and all attachments transmitted with it are
> > intended solely for the use of the addressees and may contain legally
> > privileged, protected or confidential information. If you have received
> > this message in error, and/or you are not the intended recipient, please
> > notify the sender immediately by e-mail reply and please delete this
> > message from your computer and destroy any copies. Any unauthorized use,
> > reproduction, forwarding, distribution, or other dissemination of this
> > transmission is strictly prohibited and may be unlawful.
> >
>


Re: [CODE4LIB] Seeking for a list of popular public libraries' urls in North America

2018-01-29 Thread Becky Yoose
Hello Will,

If you haven't checked it out already, Marshall Breeding's site has a
section listing public libraries at
https://librarytechnology.org/libraries/uspublic/. The site doesn't rank by
popularity, but each public library's page has some basic statistics, such
as service population, collection size, and circulation.

Cheers,
Becky


--
Becky Yoose
Library Applications and Systems Manager
The Seattle Public Library

On Sun, Jan 28, 2018 at 2:43 PM, Will Skora  wrote:

> Hi,
>
>
> I was writing to ask if a list of the most popular public libraries' urls
> (North American preferably) exists and if so, if it can be shared. I don't
> have a specific metric to define 'most popular' (like # of patrons served,
> items in circulation, etc); instead, I'm just trying to informally compare
> public libraries' websites on a couple characteristics and do this without
> having to find all of their URLs.
>
>
> Regards,
>
> Will Skora
>
> Web Administrator
>
> Cleveland Public Library
>
> (https://cpl.org)
>
> Cleveland Public Library – The People's University
> cpl.org
> Cleveland Public Library houses the largest chess publication library in
> the world. It includes any type of chess-related document, manuscript,
> photograph, object, or ...
>
>
>
> NOTICE: This e-mail message and all attachments transmitted with it are
> intended solely for the use of the addressees and may contain legally
> privileged, protected or confidential information. If you have received
> this message in error, and/or you are not the intended recipient, please
> notify the sender immediately by e-mail reply and please delete this
> message from your computer and destroy any copies. Any unauthorized use,
> reproduction, forwarding, distribution, or other dissemination of this
> transmission is strictly prohibited and may be unlawful.
>