I have found two solutions. The first is pdfbeads. At the beginning it 
didn't work in Archlinux because there was a bug in the package. 
Now the bug is solved and I can merge a hocr file with an image. But for me 
the quality of the pdf could be better.

An other option is HocrConverter. We can find several version, I took this 
one <https://github.com/ryanfb/HocrConverter>. The original thread is here 
<http://xplus3.net/2009/04/02/convert-hocr-to-pdf/>. It was necessary
to update the script for python 3. At the end the quality of the pdf seems 
better. When we search a word in a pdf, the word is highlighted with a
box. The accuracy of the position of the box seems a little bit worse than 
with pdfbeads. For me it will be enough.

Perhaps those posts can help somebody.

Regards,

Cédric

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/32b646d8-cfb0-4b6a-8726-497e0aecd84b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to