We need to embed the image as well as hocr in the PDF file so that the PDF will become searchable.
Thanks, Raghu On Sun, Feb 22, 2009 at 5:46 PM, Thomas Breuel <[email protected]> wrote: > On Sun, Feb 22, 2009 at 19:46, Raghu Udupa <[email protected]> wrote: > >> Thanks Faisal. >> >> I am planning to use ocropus/tesseract for TIFF to OCR conversion. >> >> I was looking for a reliable HOCR to PDF conversion program on Linux >> platform with a C/C++ API or a program that can be called on command line. >> > > There are actually several different kinds of hOCR-to-PDF conversions > possible; some convert the hOCR/HTML text itself to PDF, others embed the > page image and use the hOCR info just for searching. > > We're going to be focusing on that as part of a new project later in the > year. > > For now, we're working hard on getting the next release of OCRopus out. > > Tom > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
