Re: [CODE4LIB] Identification of damaged text

2015-12-01 Thread Jesse Martinez
Hi John,

That sounds really interesting! Can you share a link to this game or code?

Jesse

On Tue, Dec 1, 2015 at 3:43 PM, Jason Bengtson 
wrote:

> This may be a dumb thought, but I built a game a couple of years ago which
> tracked results on a map (on an HTML canvas, with the map set as a
> background with objects drawn on top of it) by counting the pixels of a
> certain color and comparing them as a percentage against the pixels in the
> whole map. You could do something similar, by comparing black or gray
> beyond a particular threshold against total pixels. That would be a pretty
> rough and ready approach, but it might be worth a shot. If the missing
> sections have a significantly different color than the rest of the image,
> that could be another metric to use.
>
> Best regards,
> *Jason Bengtson, MLIS, MA*
> Innovation Architect
>
>
> *Houston Academy of MedicineThe Texas Medical Center Library*
> 1133 John Freeman Blvd
> Houston, TX   77030
> http://library.tmc.edu/
> www.jasonbengtson.com
>
> On Tue, Dec 1, 2015 at 2:07 PM, Christine Mayo  wrote:
>
> > Hi all,
> >
> > I have an interesting assessment issue with some recently digitized
> > newspapers that I wondered if anyone could shed some light on. We sent a
> > batch of 19th century newspapers off to a vendor knowing they weren't in
> > great shape, and now we have to decide whether the resultant images
> (TIFFs)
> > are usable or we should be looking for alternative copies and/or
> microfilm.
> >
> > A lot of the images are in decent shape, but the first few pages of each
> > issue are heavily creased and generally missing a smallish piece from the
> > center of the page where the folds met. I'm looking for a way to
> > programmatically identify how much text is missing/unusable for each
> page.
> > We haven't run OCR yet, part of this assessment is to figure out whether
> we
> > should bother sending these items out for OCR and METS/ALTO creation,
> but I
> > suspect we could run a quick and dirty in-house OCR if that would help.
> >
> > We can go through the images by hand and try to measure and/or count, but
> > if anyone's worked on something like this or has thoughts, I'd love to
> hear
> > them!
> >
> > Thanks,
> > Christine
> >
> > --
> > Christine Mayo
> > Digital Production Librarian
> > Thomas P. O'Neill, Jr. Library
> > Boston College
> > 140 Commonwealth Avenue
> > Chestnut Hill, MA 02467
> > christine.m...@bc.edu
> >
>



-- 
Jesse Martinez
Web Services Librarian
O'Neill Library, Boston College
jesse.marti...@bc.edu
617-552-2509


[CODE4LIB] Identification of damaged text

2015-12-01 Thread Christine Mayo
Hi all,

I have an interesting assessment issue with some recently digitized
newspapers that I wondered if anyone could shed some light on. We sent a
batch of 19th century newspapers off to a vendor knowing they weren't in
great shape, and now we have to decide whether the resultant images (TIFFs)
are usable or we should be looking for alternative copies and/or microfilm.

A lot of the images are in decent shape, but the first few pages of each
issue are heavily creased and generally missing a smallish piece from the
center of the page where the folds met. I'm looking for a way to
programmatically identify how much text is missing/unusable for each page.
We haven't run OCR yet, part of this assessment is to figure out whether we
should bother sending these items out for OCR and METS/ALTO creation, but I
suspect we could run a quick and dirty in-house OCR if that would help.

We can go through the images by hand and try to measure and/or count, but
if anyone's worked on something like this or has thoughts, I'd love to hear
them!

Thanks,
Christine

-- 
Christine Mayo
Digital Production Librarian
Thomas P. O'Neill, Jr. Library
Boston College
140 Commonwealth Avenue
Chestnut Hill, MA 02467
christine.m...@bc.edu


Re: [CODE4LIB] Identification of damaged text

2015-12-01 Thread Jason Bengtson
This may be a dumb thought, but I built a game a couple of years ago which
tracked results on a map (on an HTML canvas, with the map set as a
background with objects drawn on top of it) by counting the pixels of a
certain color and comparing them as a percentage against the pixels in the
whole map. You could do something similar, by comparing black or gray
beyond a particular threshold against total pixels. That would be a pretty
rough and ready approach, but it might be worth a shot. If the missing
sections have a significantly different color than the rest of the image,
that could be another metric to use.

Best regards,
*Jason Bengtson, MLIS, MA*
Innovation Architect


*Houston Academy of MedicineThe Texas Medical Center Library*
1133 John Freeman Blvd
Houston, TX   77030
http://library.tmc.edu/
www.jasonbengtson.com

On Tue, Dec 1, 2015 at 2:07 PM, Christine Mayo  wrote:

> Hi all,
>
> I have an interesting assessment issue with some recently digitized
> newspapers that I wondered if anyone could shed some light on. We sent a
> batch of 19th century newspapers off to a vendor knowing they weren't in
> great shape, and now we have to decide whether the resultant images (TIFFs)
> are usable or we should be looking for alternative copies and/or microfilm.
>
> A lot of the images are in decent shape, but the first few pages of each
> issue are heavily creased and generally missing a smallish piece from the
> center of the page where the folds met. I'm looking for a way to
> programmatically identify how much text is missing/unusable for each page.
> We haven't run OCR yet, part of this assessment is to figure out whether we
> should bother sending these items out for OCR and METS/ALTO creation, but I
> suspect we could run a quick and dirty in-house OCR if that would help.
>
> We can go through the images by hand and try to measure and/or count, but
> if anyone's worked on something like this or has thoughts, I'd love to hear
> them!
>
> Thanks,
> Christine
>
> --
> Christine Mayo
> Digital Production Librarian
> Thomas P. O'Neill, Jr. Library
> Boston College
> 140 Commonwealth Avenue
> Chestnut Hill, MA 02467
> christine.m...@bc.edu
>


[CODE4LIB] Job: Head, Design & Discovery at University of Michigan

2015-12-01 Thread jobs
Head, Design & Discovery
University of Michigan
Ann Arbor

**Head, Design & Discovery**  
  
**How to Apply**  
A cover letter is required for consideration for this position and should be
attached as the first page of your resume. The cover letter should address
your specific interest in the position, include your salary requirements, and
outline skills and experience that directly relate to this position.

  
**Job Summary**  
The University of Michigan Library seeks an experienced professional to manage
and lead the newly created Design & Discovery unit within the Library
Information Technology Division (LIT). Design & Discovery, composed of twelve
talented and experienced designers, librarians, and developers, encompasses
three essential IT service areas:

  * LIT-wide program and service management
  * User experience (usability, accessibility, content, information 
architecture, design)
  * Front-end web application development
The Head of Design & Discovery, reporting to the Associate University
Librarian for LIT, provides leadership for programmatic initiatives of the
unit; directs management of the unit's project and operational portfolio; and
collaborates with the IT leadership team to guide the strategic direction of
the division as a whole. The work of the Design & Discovery unit is highly
collaborative, with projects and initiatives involving staff from across LIT,
the Library, the campus, and beyond. Unit activities include coordinating
strategic initiatives; designing user-centered IT service models and cross-
division work flows; providing analytics-driven guidance for public discovery,
access, and content systems; promoting the adoption of technology policies and
standards; and coordinating IT project and resource stewardship initiatives.
The unit's service and operational portfolio includes the library web site;
in-house and vendor-based cross-platform search and discovery systems; user
interface frameworks; library staff business work flow and content tools; and
digital exhibits. Within its design portfolio, the unit promotes user-centered
design while providing leadership and expertise in User Experience (UX)
strategy, user research, assessment, content creation and management, and web
accessibility.

  
The successful candidate will work as part of a team of IT managers, with a
combined staff of over 60 FTE, focused on realizing the dual mission of the
division: enabling library services through elegant technology solutions; and
uniting the preservation, access, and publishing of digital content.
Collectively, the division supports the development and upkeep of fundamental
services including the library website; information discovery and access
applications; the library management system; learning analytics; and learning
technologies.

  
The University of Michigan Library is one of the world's largest academic
research libraries and serves a vibrant university community that is home to
19 schools and colleges, 100 top ten graduate programs, and annual research
expenditures approaching $1.5 billion a year. To enable the university's
world­changing work and to serve the public good, the library collects,
preserves, and shares the scholarly and cultural record in all existing and
emerging forms, and leads the reinvention of the academic research library in
the digital age.

  
The library is committed to recruiting and retaining a diverse workforce and
encourages all employees to fully incorporate their diverse backgrounds,
skills, and life experiences into their work and towards the fulfillment of
the library's mission.

  
**Responsibilities***  
_Leadership and strategy_ - Contributes to strategic planning for LIT in the
context of library and university goals, and translates strategic thinking
into goal-oriented planning and implementation road maps for Design &
Discovery priorities and areas of activity

  
_Management and ­supervision_ - Facilitates operational excellence for the
Design & Discovery unit, including resource management, staff management,
mentoring and training, and general supervision

  
_Design and development _- Takes part in high level design and development of
applications, methodologies, and services in cooperation with a spectrum of
stakeholders within LIT and across the library

  
_Research, teaching, and publication_ - Participates actively in the larger
professional community by actively exploring relevant topics, and works to
share knowledge through regular presentations, publications, and teaching

  
**Required Qualifications***

  * An ALA-accredited master's degree or advanced degree in a related field 
such as Interaction Design, Experience Design, Information Architecture, 
Knowledge Management, or IT Management and five or more years relevant 
experience or equivalent combination of a relevant advanced degree and 
experience.
  * At least three years staff management experience
  * Demonstrated understanding of the role and potential of technology 

Re: [CODE4LIB] MARC to EAD for ArchivesSpace import stylesheet

2015-12-01 Thread Kari R Smith
This is great - thanks for sharing, Nicole!  I copied the ArchivesSpace User 
Group as well.

Kari Smith

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Smeltekop, 
Nicole [nic...@mail.lib.msu.edu]
Sent: Thursday, November 19, 2015 13:36
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] MARC to EAD for ArchivesSpace import stylesheet

Hi all,

I've developed a stylesheet that will convert MARC records to EAD for import 
into ArchivesSpace using MarcEdit.  This is really useful for those migrating 
from platforms other than Archon or AT who have catalog records for collections.

It's available here: https://github.com/MSU-Libraries/MARCtoEADforASpace.

Just a few notes:

*This stylesheet maps MARC bibliographic data to the ASpace flavor of 
EAD.  If your finding aid needs container listings added, you'll have to add 
those in manually, as they are not part of the MARC bib record.  I've indicated 
where to do that in the stylesheet.

*There's also some required MARC fields for the stylesheet to output 
required ASpace fields (such as the 300).  If original MARC record doesn't have 
these fields, you'll get an error when importing into ArchivesSpace. The error 
log does tell you if you are missing something (for a missing 300 field, it's 
an extent statement), so you can add that into the EAD file.

*This stylesheet includes fields not in the LC MARC to EAD crosswalk.  
They were fields used in our records (such as 590s or 246s) that import into 
ASpace as notes with a header using the Marc field title.

Let me know if you have any questions or issues.  As I mention in my github, 
I'm a newbie, so if anyone sees places to improve the code, please let me know!

Cheers,

-Nicole

Nicole Garrett Smeltekop
Special Materials Catalog Librarian
Michigan State University Libraries
366 W. Circle Drive, Room W108C
East Lansing, MI 48824
517-884-0818
nic...@msu.edu


[CODE4LIB] ALCTS Technical Services Workflow Efficiency Interest Group (TSWEIG) at the 2016 ALA Midwinter Meeting in Boston, MA

2015-12-01 Thread Glerum, Margaret
This message has been sent out to multiple lists. Please excuse any duplication.

Please join the ALCTS Technical Services Workflow Efficiency Interest Group 
(TSWEIG) at the 2016 ALA Midwinter Meeting in Boston, MA.

Time: Monday, January 11, 2016, 1:00 p.m. - 2:30 p.m.
Place: Boston Convention and Exhibition Center, Room 103

Streamlining ETD Processing at the University of Iowa Libraries using Trello 
Board
Amanda Z. Xu, Metadata Analyst Librarian, University of Iowa Libraries

The ETD processing at University of Iowa Libraries is a complex workflow 
requiring project management, collaboration with project stakeholders within 
the Cataloging and Metadata department and other departments in Digital 
Publishing and Preservation. The digital scholarship librarian receives XML and 
PDF files from ProQuest and the Preservation Metadata Librarian copies these 
files into an archive for digital preservation. Another copy of the files are 
generated for the Cataloging and Metadata department to process the ETDs for 
the Iowa Research Online (IRO) institutional repository and OCLC Connexion. 
This presentation will describe the workflows and collaboration of ETD 
processing at the University of Iowa Libraries, and the implementation of 
Trello Board for tracking the ETD workflow.

>From Excel ETD Metadata to MARC Bib and NACO Records in  4 8 12 easy steps!

Steven W. Holloway, Metadata Librarian, James Madison University
At JMU the library receives ETD metadata from our institutional repository as 
bulk Excel files.  A combination of open source and home-grown XSLTs permit us 
to generate complete MARCXML RDA bibliographic records that we export to OCLC 
after minor editing, to which subject headings are assigned at a later point.  
We also create NACO records for the dissertants, based on our asking the right 
questions in the ETD submission form, and use XSLT transformations for this as 
well. The XML files are stored and edited in an eXist-db (native XML database) 
instance set up as a web service.  There are several steps in the workflow but 
the system is scaled to accommodate many hundreds of ETD submissions at a time, 
and can be adapted for any spreadsheet-based metadata amenable to 
transformation into MARC or BIBFRAME formats.

Catalog ALL THE THINGS: Leveraging Automation to Catalog a Massive Audio-Visual 
Collection

Lucas Mak, Metadata and Catalog Librarian; Autumn Faulkner, Head of Copy 
Cataloging; and Joshua Barton, Head of Cataloging and Metadata Services & 
Assistant Head of Technical Services, Michigan State University Libraries


Michigan State University Libraries (MSUL) recently received a gift of more 
than 800,000 titles of sound and video recordings. Even though a minimal set of 
metadata was provided by the donor, the sheer quantity still posed an 
unprecedented challenge for cataloging. However, with the help of scripting and 
APIs for various online metadata sources, MSUL was able to catalog and make the 
collection available for circulation six months after the receipt of this gift. 
This presentation will discuss the design and execution of this automated 
workflow, limitations, unintentional consequences, responses to resulting 
problems, and follow-up record enrichment plans, as well as what we might do 
differently if we had the chance.

>From MODS to OCLC through the WorldCat Metadata API
Akhtar Shaun, Metadata Librarian, Dartmouth College Library

The Dartmouth College Library's MODS repository is its primary metadata source 
for local digital collections and items. The library wants to incorporate the 
original cataloging done in MODS for both legacy and new digital projects into 
WorldCat, in order to expose the metadata for the library's unique resources on 
a global platform and establish OCLC record identifiers for local and external 
use. OCLC's WorldCat Metadata API presented a new opportunity to effectively 
meet this need. The library has developed a command-line batch processing tool 
that uses the Metadata API to create and update records using MODS-derived 
MARCXML. This presentation will explore the details of our workflow, how the 
tool has been developed to support our use cases, and what we've learned about 
the API so far. The tool is currently being piloted at Dartmouth to create 
master records in WorldCat for digital dissertations and archival posters. Its 
use may be expanded to a variety of other WorldCat-connected cataloging 
workflows for local collections. Written in Ruby, it provides detailed logging 
and reporting capabilities, and builds on code previously released by the OCLC 
Developer Network and Terry Reese.


Annie and Hayley

Co-chairs TSWEIG

Margaret "Annie" Glerum
Head of Complex Cataloging
Department of Cataloging & Description
Division of Special Collections & Archives
Florida State University Libraries
850-644-4839
agle...@fsu.edu

Heylicken Moreno
Resource Description Coordinator
University of Houston Libraries