Re: [CODE4LIB] Looking for Ideas on Line Breaks in OCR Text

2015-08-04 Thread Scancella, John
Matt, A word document does funny things to the text since it is actually html (try opening a .doc in a plain text editor and you will see it is html). I would try and get the plain ASCII text instead, and then install Cygwin which contains Sed and a bunch of other usful Unix/Linux commands.

Re: [CODE4LIB] Looking for Ideas on Line Breaks in OCR Text

2015-08-04 Thread Matt Sherman
Hm, doing a little looking on someone's suggestion it turns out I was wrong, they are not line breaks, they are paragraph marks. On Tue, Aug 4, 2015 at 9:21 AM, Scancella, John j...@loc.gov wrote: Matt, A word document does funny things to the text since it is actually html (try opening a

Re: [CODE4LIB] Looking for Ideas on Line Breaks in OCR Text

2015-08-04 Thread Matt Sherman
I am on Windows machines, so I don't have quite the easy access to that useful command. Someone had earlier put the OCR in a doc file so I've been playing with that more than with the raw PDF OCR. On Tue, Aug 4, 2015 at 8:19 AM, Scancella, John j...@loc.gov wrote: Matt, There are probably a

Re: [CODE4LIB] Looking for Ideas on Line Breaks in OCR Text

2015-08-04 Thread Scancella, John
Matt, There are probably a dozen ways to do this, but it would be really helpful to know what operating system you are on? For example, if you are using Linux, you can run it through sed using cat OCR_FILE | sed 's/\n//' STRIPPED_OCR_FILE see http://stackoverflow.com/a/800644/2896744 for

[CODE4LIB] Job: Information Sciences and Business Liaison Librarian at Pennsylvania State University

2015-08-04 Thread jobs
Information Sciences and Business Liaison Librarian Pennsylvania State University University Park, PA The Pennsylvania State University Libraries seek a creative and service- oriented information sciences and business liaison librarian for a tenure- track faculty position, serving as the subject

[CODE4LIB] Job: Records Analyst at University of Toronto

2015-08-04 Thread jobs
Records Analyst University of Toronto Cayuga, Ontario **Records Analyst** **Organization: **Haldimand County **City: **Cayuga **Province/State: **OntarioUniversity o **Country: **Canada **Category: **Records Management **Job type: **Full-time **Duration: **Permanent

[CODE4LIB] Job: Librarian II - Spanish Materials Specialist at Aurora Public Library

2015-08-04 Thread jobs
Librarian II - Spanish Materials Specialist Aurora Public Library Aurora, Illinois Librarian II - Spanish Materials Specialist Aurora Public Library, Aurora, Illinois Salary: Starting at $49,795 Status: Full-time Posted: 07/29/15 Deadline: Librarian II - Spanish Materials

[CODE4LIB] Job: University Archivist (Tennessee Tech University, Tennessee) at Tennessee Technological University

2015-08-04 Thread jobs
University Archivist (Tennessee Tech University, Tennessee) Tennessee Technological University Cookeville, Tennessee University Archivist Bookmark and Share Tennessee Tech University, Cookeville, Tennessee Salary: Not Specified Status: Full-time Posted: 07/30/15 Deadline: 09/09/15

[CODE4LIB] Job: Russian Cataloger (, Hoover Institution, California) at Hoover Institution Library and Archives

2015-08-04 Thread jobs
Russian Cataloger (, Hoover Institution, California) Hoover Institution Library and Archives Stanford University **Russian Cataloger - 67759** **Description** The Hoover Institution is seeking qualified candidates for the position of Russian Cataloger. The position is a full-time

[CODE4LIB] Job: Manager, Client Services Library Facilities at Thompson Rivers University

2015-08-04 Thread jobs
Manager, Client Services Library Facilities Thompson Rivers University Kamloops, British Columbia Manager, Client Services Library Facilities - (01220) **Application Restrictions** Open to both Internal and external **Job Type** Administrative/Management **Posting In effect

[CODE4LIB] Job: Research and Instruction Librarian (Texas AM University - Commerce, Texas) at Texas AM University–Commerce

2015-08-04 Thread jobs
Research and Instruction Librarian (Texas AM University - Commerce, Texas) Texas AM University–Commerce Commerce, Texas _**Research and Instruction Librarian**_ **Position Information** **Position Title:** Research and Instruction Librarian **Posting Title: **Research and Instruction

[CODE4LIB] Job: Librarian I - Spanish Language Services (Aurora Public Library, Illinois) at Aurora Public Library

2015-08-04 Thread jobs
Librarian I - Spanish Language Services (Aurora Public Library, Illinois) Aurora Public Library Aurora **Librarian I - Spanish Language Services** **Santori Public Library** The Aurora Public Library, Aurora, Illinois seeks a full-time bi-lingual Librarian I for the new Richard Gina

[CODE4LIB] Job: Chair, Cataloging and Discovery Services at University of Florida

2015-08-04 Thread jobs
Chair, Cataloging and Discovery Services University of Florida Gainesville, FL Posting Details Posting Number: 0800488 Job Title: Digital Collections Curator (Staff Assistant, SL-2) Application Deadline: 08-30-2015 Department: Library Full-Time or Part-Time: Full-Time Part-time %:

Re: [CODE4LIB] Looking for Ideas on Line Breaks in OCR Text

2015-08-04 Thread Kyle Banerjee
On Tue, Aug 4, 2015 at 6:09 AM, Matt Sherman matt.r.sher...@gmail.com wrote: I am on Windows machines, so I don't have quite the easy access to that useful command. Someone had earlier put the OCR in a doc file so I've been playing with that more than with the raw PDF OCR. Versions of the

Re: [CODE4LIB] Looking for Ideas on Line Breaks in OCR Text

2015-08-04 Thread Matt Sherman
That worked pretty well. There is still come clean up I have to do but [A-z]^p[A-z] to [A-z] [A-z] did a lot of the cleanup. On Tue, Aug 4, 2015 at 12:17 PM, Kyle Banerjee kyle.baner...@gmail.com wrote: On Tue, Aug 4, 2015 at 6:09 AM, Matt Sherman matt.r.sher...@gmail.com wrote: I am on

[CODE4LIB] Job: Historical Records Project Archivist at Union College

2015-08-04 Thread jobs
Historical Records Project Archivist Union College Schenectady, NY Historical Records Project Archivist (Term Appt. Ending June 30, 2017) Union College, Schenectady, New York Salary: Not Specified Status: Full-time Posted: 07/29/15 Deadline: Historical Records Project Archivist