I have found two solutions. The first is pdfbeads. At the beginning it didn't work in Archlinux because there was a bug in the package. Now the bug is solved and I can merge a hocr file with an image. But for me the quality of the pdf could be better.
An other option is HocrConverter. We can find several version, I took this one <https://github.com/ryanfb/HocrConverter>. The original thread is here <http://xplus3.net/2009/04/02/convert-hocr-to-pdf/>. It was necessary to update the script for python 3. At the end the quality of the pdf seems better. When we search a word in a pdf, the word is highlighted with a box. The accuracy of the position of the box seems a little bit worse than with pdfbeads. For me it will be enough. Perhaps those posts can help somebody. Regards, Cédric -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/32b646d8-cfb0-4b6a-8726-497e0aecd84b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

