Did you asked this question author of hocr2pdf ?

Zdenko


On Sun, Jan 26, 2014 at 9:25 PM, peiman F. <[email protected]> wrote:

> Hi
> i have a pdf file and i have to make it searchable
> the pdf is in arabic language
>
> i can ocr its a single page with tesseract without any problem
> but when i use  hocr2pdf the out put pdf file have some mistakes!!
>
> the 1.pdf is my source file
> and the 123.pdf is the 4th page that searchabled by hocr2pdf
>
> i used commands like this:
>
> 1)
> tesseract 1/out04.jpg 552233 -l ara hocr
>
> 2)
> hocr2pdf -i 1/out04.jpg -s -o 123.pdf < 552233.html
>
> am i wrong or hocr2pdf dont support utf-8 !?
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to