-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
I wonder if any of you might have experience with creating text PDFs
from TIFFs. I've been using tiffcp to stitch TIFFs together into a
single image and then using tiff2pdf to generate PDFs from the single
TIFF. I've had to pass this image-based
You might want to look at ABBYY Fine Reader 9.0 Professional, which can be
driven from the command line. Fine Reader is used at the Library of
Congress. Here is a info link to get you started (search command):
If you haven't already, take a look at tesseract (
http://code.google.com/p/tesseract-ocr/). There's some discussion of using
tesseract and shell scripting to work with tiffs to pdfs to ocr'd text,
which isn't exactly what you're wanting to do, I know, but may prove helpful
This is somewhat off-topic, since you asked for something you can use
on Linux. In any case...
I've been using OmniPage 16, and I'm sorry to say I can't recommend
it. You can't run it from the command line, so you can't really
integrate it into a script. It does have a batch manager, so you can
***Cross-posted; apologies for duplication***
The eXtensible Catalog Project is pleased to announce that we have
launched our new website at http://www.extensiblecatalog.org/. This new
website will be the main vehicle for distributing our open-source
software once it is released in 2009. In
Hello all,
My name is Joanna White and I am the Product Manager for the WorldCat
Registry. The WorldCat Registry is a directory of libraries and services
they provide. Through a secure webtool, libraries can manage and share
information about their institutional identity, and makes institutional
I joined myself to the group just today, too late to vote, but what I see is 23
votes for Boston and 43 for anywhere else. Shouldn't there at least be a
runoff?
-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of Jay Luker
Sent: Wednesday, October 15,
Vacancy Number: 0600774
Position Title: Analyst Programmer Intermediate
Type of Position: Regular Staff
Department: Library
Duties: Reporting to the Web Development Librarian, the Analyst Programmer
develops, maintains, and troubleshoots web based applications in support of the
University
FYI: note below sent out to Karen Calhoun in the [EMAIL PROTECTED]
=
'OCLC would be required to work with the Library of Congress as the producer
of the NAF data before OCLC could create the NAF notification service'
Greetings Karen,
Per Roy's statement at the top, I have
And beyond Tesseract is Ocropus (http://code.google.com/p/ocropus/),
which uses Tesseract (and eventually other ocr engines) to generate
positional OCR in an HTML format. I wonder if you could process that
HTML slightly to put the TIFF in the background, then use an HTML to PDF
tool to generate
Forwarded on behalf of Carl Lagoze and the OAI-ORE authoring team...
Begin forwarded message:
From: Carl Lagoze [EMAIL PROTECTED]
Date: October 17, 2008 4:02:14 PM EDT
To: Tim DiLauro [EMAIL PROTECTED]
Subject: Please disseminate - Release of Version 1.0 Production OAI
Object Reuse and
Ya'aqov,
Why don't you consider contacting the NACO program at the Library of
Congress? They would be more equipped to answer your questions.
Mark Matienzo
Applications Developer, Digital Experience Group
The New York Public Library
Same for me on FF3. Also, the same error on IE 7 and Safari 3 for
Windows. All browsers are identified as IE 6.
Windows XP SP 2.
---
David Cloutman [EMAIL PROTECTED]
Electronic Services Librarian
Marin County Free Library
-Original Message-
From: Code for Libraries [mailto:[EMAIL
I'm having the same problem with Safari 3.1.1 on OS X, which the site thinks is
also IE 6 on Windows XP. I haven't encountered this problem in years!
--
Brenda Chawner
Senior Lecturer LIM Programmes Director
School of Information Management
Victoria University of Wellington
P O Box 600,
I used Internet Explorer 7 to go this website, and I get the message You
are using *Internet Explorer* version *6.0* on *Windows XP*
-Chris Alhambra
On Fri, Oct 17, 2008 at 4:11 PM, Mark A. Matienzo [EMAIL PROTECTED] wrote:
I'm using Firefox 3 on OS X and the project's website is claiming I'm
I'm running FF3 on Ubuntu. No dice.
Tried it in Opera 9.x in Ubuntu. Still doesn't work.
On Fri, Oct 17, 2008 at 4:17 PM, Chris Alhambra [EMAIL PROTECTED] wrote:
I used Internet Explorer 7 to go this website, and I get the message You
are using *Internet Explorer* version *6.0* on
The site was working fine earlier, as I was able to view it with Opera
(now, of course, I've the same problems).
For the time being, this should get you there:
http://www.extensiblecatalog.org/node/59
-Original Message-
From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Yes, I've tried tesseract and found it to be pretty accurate, but I
don't believe there is a way to integrate the text back into the PDF.
It's easy to pull text out of image-based PDFs, but not to put the text
back in. Driving me crazy...
Thanks for
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Thanks for the tip. Especially the part where you make it clear that
OmniPage doesn't really work. Back to Acrobat, I guess.
Thanks all!
Jonathan Brinley wrote:
This is somewhat off-topic, since you asked for something you can use
on Linux. In
19 matches
Mail list logo