If you already have PDFs - why are you also storing images?   PDF is an open 
international standard (ISO 32000) that offers not only a richer content model 
(including text, vector and raster) but also metadata, marginalia and more 
using modern compression methods.  TIFF on the other hand is a proprietary 
standard (that hasn't been updated since 1992) that only handles raster images 
& metadata.

Leonard

From: Mark Ehle <[email protected]<mailto:[email protected]>>
Date: Wednesday, May 7, 2014 at 8:27 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [poppler] Combine bounding box data and tiff to create pdf?

Folks -

I am using pdtotxt to extract text from pdf file in a digital newspaper archive 
I am creating for a local public library. So far, it's working great. But - I 
am using up a far amount of disk space and would like to figure out a way to 
create an OCR'd pdf from an image and the bounding box data. That way I would 
not have to store the PDF files as well as the images. Is there a way to do 
that?

Thanks -

Mark
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to