I just tested hocr2pdf, and amazingly you're right, it doesn't seem to support UTF-8. Which is pretty shocking.
> maybe you can try alternative solution ;-) [1]. It was created by google(I > think ;-) ) and there is visible contributor e-mail if it does not work :-) > > https://code.google.com/p/hocr-tools/source/browse/hocr-pdf Zdenko's correct, this is much better. As was mentioned it isn't documented. I'll try to correct this soon, but in the meantime some pointers: - It requires the 'reportlab' package for python. On a Debian based system the appropriate package is called 'python-reportlab'. - I had to change line 46 from dpi = im.info['dpi'] to dpi = im.info['dpi'][0] - It expects .jpg and .hocr files, named the same per page, and in the same directory. It's then run like this: hocr-pdf my-directory Hopefully that's enough to be getting along with. As I say I'll try to write up a basic manpage for hocr-pdf, and make the fix on line 46 general enough to be applied. Nick -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

