[CODE4LIB] OCR PDFs

2008-10-17 Thread James Tuttle
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I wonder if any of you might have experience with creating text PDFs from TIFFs. I've been using tiffcp to stitch TIFFs together into a single image and then using tiff2pdf to generate PDFs from the single TIFF. I've had to pass this image-based

Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread Terry Harrison
You might want to look at ABBYY Fine Reader 9.0 Professional, which can be driven from the command line. Fine Reader is used at the Library of Congress. Here is a info link to get you started (search command):

Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread Bridger Dyson-Smith
If you haven't already, take a look at tesseract ( http://code.google.com/p/tesseract-ocr/). There's some discussion of using tesseract and shell scripting to work with tiffs to pdfs to ocr'd text, which isn't exactly what you're wanting to do, I know, but may prove helpful

Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread Jonathan Brinley
This is somewhat off-topic, since you asked for something you can use on Linux. In any case... I've been using OmniPage 16, and I'm sorry to say I can't recommend it. You can't run it from the command line, so you can't really integrate it into a script. It does have a batch manager, so you can

[CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Dibelius, Steven
***Cross-posted; apologies for duplication*** The eXtensible Catalog Project is pleased to announce that we have launched our new website at http://www.extensiblecatalog.org/. This new website will be the main vehicle for distributing our open-source software once it is released in 2009. In

Re: [CODE4LIB] registry of databases

2008-10-17 Thread White,Joanna
Hello all, My name is Joanna White and I am the Product Manager for the WorldCat Registry. The WorldCat Registry is a directory of libraries and services they provide. Through a secure webtool, libraries can manage and share information about their institutional identity, and makes institutional

Re: [CODE4LIB] Vote for NE code4lib meetup location

2008-10-17 Thread Barnett, Jeffrey
I joined myself to the group just today, too late to vote, but what I see is 23 votes for Boston and 43 for anywhere else. Shouldn't there at least be a runoff? -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Jay Luker Sent: Wednesday, October 15,

[CODE4LIB] Job posting: Analyst Programmer Intermediate - Georgia State University Library

2008-10-17 Thread Douglas Goans
Vacancy Number: 0600774 Position Title: Analyst Programmer Intermediate Type of Position: Regular Staff Department: Library Duties: Reporting to the Web Development Librarian, the Analyst Programmer develops, maintains, and troubleshoots web based applications in support of the University

[CODE4LIB] FW: NAF notification service from OCLC

2008-10-17 Thread Ya'aqov Ziso
FYI: note below sent out to Karen Calhoun in the [EMAIL PROTECTED] = 'OCLC would be required to work with the Library of Congress as the producer of the NAF data before OCLC could create the NAF notification service' Greetings Karen, Per Roy's statement at the top, I have

Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread Binkley, Peter
And beyond Tesseract is Ocropus (http://code.google.com/p/ocropus/), which uses Tesseract (and eventually other ocr engines) to generate positional OCR in an HTML format. I wonder if you could process that HTML slightly to put the TIFF in the background, then use an HTML to PDF tool to generate

[CODE4LIB] Fwd: Please disseminate - Release of Version 1.0 Production OAI Object Reuse and Exchange Specifications

2008-10-17 Thread Tim DiLauro
Forwarded on behalf of Carl Lagoze and the OAI-ORE authoring team... Begin forwarded message: From: Carl Lagoze [EMAIL PROTECTED] Date: October 17, 2008 4:02:14 PM EDT To: Tim DiLauro [EMAIL PROTECTED] Subject: Please disseminate - Release of Version 1.0 Production OAI Object Reuse and

Re: [CODE4LIB] FW: NAF notification service from OCLC

2008-10-17 Thread Mark A. Matienzo
Ya'aqov, Why don't you consider contacting the NACO program at the Library of Congress? They would be more equipped to answer your questions. Mark Matienzo Applications Developer, Digital Experience Group The New York Public Library

Re: [CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Cloutman, David
Same for me on FF3. Also, the same error on IE 7 and Safari 3 for Windows. All browsers are identified as IE 6. Windows XP SP 2. --- David Cloutman [EMAIL PROTECTED] Electronic Services Librarian Marin County Free Library -Original Message- From: Code for Libraries [mailto:[EMAIL

Re: [CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Brenda Chawner
I'm having the same problem with Safari 3.1.1 on OS X, which the site thinks is also IE 6 on Windows XP. I haven't encountered this problem in years! -- Brenda Chawner Senior Lecturer LIM Programmes Director School of Information Management Victoria University of Wellington P O Box 600,

Re: [CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Chris Alhambra
I used Internet Explorer 7 to go this website, and I get the message You are using *Internet Explorer* version *6.0* on *Windows XP* -Chris Alhambra On Fri, Oct 17, 2008 at 4:11 PM, Mark A. Matienzo [EMAIL PROTECTED] wrote: I'm using Firefox 3 on OS X and the project's website is claiming I'm

Re: [CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Ethan Gruber
I'm running FF3 on Ubuntu. No dice. Tried it in Opera 9.x in Ubuntu. Still doesn't work. On Fri, Oct 17, 2008 at 4:17 PM, Chris Alhambra [EMAIL PROTECTED] wrote: I used Internet Explorer 7 to go this website, and I get the message You are using *Internet Explorer* version *6.0* on

Re: [CODE4LIB] eXtensible Catalog - New Website

2008-10-17 Thread Custer, Mark
The site was working fine earlier, as I was able to view it with Opera (now, of course, I've the same problems). For the time being, this should get you there: http://www.extensiblecatalog.org/node/59 -Original Message- From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of

Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread James Tuttle
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Yes, I've tried tesseract and found it to be pretty accurate, but I don't believe there is a way to integrate the text back into the PDF. It's easy to pull text out of image-based PDFs, but not to put the text back in. Driving me crazy... Thanks for

Re: [CODE4LIB] OCR PDFs

2008-10-17 Thread James Tuttle
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks for the tip. Especially the part where you make it clear that OmniPage doesn't really work. Back to Acrobat, I guess. Thanks all! Jonathan Brinley wrote: This is somewhat off-topic, since you asked for something you can use on Linux. In