Hi,

this should work (in terms of How to use configs in pytesseract):

import os
import pytesseract
from PIL import Image

# configuration
pytesseract.pytesseract.tesseract_cmd = r"f:\win64_llvm\bin\tesseract.exe"
os.environ["TESSDATA_PREFIX"] =
r"f:\Project-Personal\tessdata_best\tessdata"
custom_configs = r'-c hocr_font_type=1'

# OCR
img = Image.open(r"image.png")
hocr = pytesseract.image_to_pdf_or_hocr(img, extension='hocr',
config=custom_configs)

But AFAIR tesseract 4.x does not provide info about font type.

Zdenko


st 18. 9. 2019 o 6:37 'Raphael Alabi' via tesseract-ocr <
tesseract-ocr@googlegroups.com> napĂ­sal(a):

> How does one pass in the hocr_font_type 1 parameter to config to be able
> to get font type information through OCR?
> I am a bit lost as to how this is done.............
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/8e47af0f-a1b9-48aa-b4b6-352f11a8945e%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/8e47af0f-a1b9-48aa-b4b6-352f11a8945e%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yr%3D%2BxvmUNk5TW3HK5c1zTm_yVMLhxHdWsLrkw%3DwK6%2Bug%40mail.gmail.com.

Reply via email to