Matt,
A word document does funny things to the text since it is actually html (try
opening a .doc in a plain text editor and you will see it is html). I would try
and get the plain ASCII text instead, and then install Cygwin which contains
Sed and a bunch of other usful Unix/Linux commands.
Hm, doing a little looking on someone's suggestion it turns out I was
wrong, they are not line breaks, they are paragraph marks.
On Tue, Aug 4, 2015 at 9:21 AM, Scancella, John j...@loc.gov wrote:
Matt,
A word document does funny things to the text since it is actually html (try
opening a
I am on Windows machines, so I don't have quite the easy access to
that useful command. Someone had earlier put the OCR in a doc file so
I've been playing with that more than with the raw PDF OCR.
On Tue, Aug 4, 2015 at 8:19 AM, Scancella, John j...@loc.gov wrote:
Matt,
There are probably a
Matt,
There are probably a dozen ways to do this, but it would be really helpful to
know what operating system you are on? For example, if you are using Linux, you
can run it through sed using
cat OCR_FILE | sed 's/\n//' STRIPPED_OCR_FILE
see http://stackoverflow.com/a/800644/2896744 for
Information Sciences and Business Liaison Librarian
Pennsylvania State University
University Park, PA
The Pennsylvania State University Libraries seek a creative and service-
oriented information sciences and business liaison librarian for a tenure-
track faculty position, serving as the subject
Records Analyst
University of Toronto
Cayuga, Ontario
**Records Analyst**
**Organization: **Haldimand County
**City: **Cayuga
**Province/State: **OntarioUniversity o
**Country: **Canada
**Category: **Records Management
**Job type: **Full-time
**Duration: **Permanent
Librarian II - Spanish Materials Specialist
Aurora Public Library
Aurora, Illinois
Librarian II - Spanish Materials Specialist
Aurora Public Library,
Aurora, Illinois
Salary: Starting at $49,795
Status: Full-time
Posted: 07/29/15
Deadline:
Librarian II - Spanish Materials
University Archivist (Tennessee Tech University, Tennessee)
Tennessee Technological University
Cookeville, Tennessee
University Archivist Bookmark and Share
Tennessee Tech University,
Cookeville, Tennessee
Salary: Not Specified
Status: Full-time
Posted: 07/30/15
Deadline: 09/09/15
Russian Cataloger (, Hoover Institution, California)
Hoover Institution Library and Archives
Stanford University
**Russian Cataloger - 67759**
**Description**
The Hoover Institution is seeking qualified candidates for the position of
Russian Cataloger. The position is a full-time
Manager, Client Services Library Facilities
Thompson Rivers University
Kamloops, British Columbia
Manager, Client Services Library Facilities - (01220)
**Application Restrictions**
Open to both Internal and external
**Job Type**
Administrative/Management
**Posting In effect
Research and Instruction Librarian (Texas AM University - Commerce, Texas)
Texas AM University–Commerce
Commerce, Texas
_**Research and Instruction Librarian**_
**Position Information**
**Position Title:** Research and Instruction Librarian
**Posting Title: **Research and Instruction
Librarian I - Spanish Language Services (Aurora Public Library, Illinois)
Aurora Public Library
Aurora
**Librarian I - Spanish Language Services**
**Santori Public Library**
The Aurora Public Library, Aurora, Illinois seeks a full-time bi-lingual
Librarian I for the new Richard Gina
Chair, Cataloging and Discovery Services
University of Florida
Gainesville, FL
Posting Details
Posting Number: 0800488
Job Title: Digital Collections Curator (Staff Assistant,
SL-2)
Application Deadline: 08-30-2015
Department: Library
Full-Time or Part-Time: Full-Time
Part-time %:
On Tue, Aug 4, 2015 at 6:09 AM, Matt Sherman matt.r.sher...@gmail.com
wrote:
I am on Windows machines, so I don't have quite the easy access to
that useful command. Someone had earlier put the OCR in a doc file so
I've been playing with that more than with the raw PDF OCR.
Versions of the
That worked pretty well. There is still come clean up I have to do
but [A-z]^p[A-z] to [A-z] [A-z] did a lot of the cleanup.
On Tue, Aug 4, 2015 at 12:17 PM, Kyle Banerjee kyle.baner...@gmail.com wrote:
On Tue, Aug 4, 2015 at 6:09 AM, Matt Sherman matt.r.sher...@gmail.com
wrote:
I am on
Historical Records Project Archivist
Union College
Schenectady, NY
Historical Records Project Archivist (Term Appt. Ending June 30, 2017)
Union College,
Schenectady, New York
Salary: Not Specified
Status: Full-time
Posted: 07/29/15
Deadline:
Historical Records Project Archivist
16 matches
Mail list logo