import ctypesimport os
os.putenv("PATH", r'C:\Program Files\Tesseract-OCR')
os.environ["TESSDATA_PREFIX"] = r'C:\Program Files\Tesseract-OCR\tessdata'
liblept = ctypes.cdll.LoadLibrary('liblept-5.dll')
pix = liblept.pixRead('test.png'.encode())print(pix)
tesseractLib = ctypes.cdll.LoadLibrary('libtesseract-5.dll')
tesseractHandle = tesseractLib.TessBaseAPICreate()
tesseractLib.TessBaseAPIInit3(tesseractHandle, '.', 'eng')
tesseractLib.TessBaseAPISetImage2(tesseractHandle, pix)
# text_out = tesseractLib.TessBaseAPIGetUTF8Text(tesseractHandle)#
print(ctypes.string_at(text_out))
tessPageIterator =
tesseractLib.TessResultIteratorGetPageIterator(tesseractHandle)
iteratorLevel = 3 # RIL_BLOCK, RIL_PARA, RIL_TEXTLINE, RIL_WORD, RIL_SYMBOL
tesseractLib.TessPageIteratorBoundingBox(tessPageIterator, iteratorLevel,
ctypes.c_int(0), ctypes.c_int(0), ctypes.c_int(0), ctypes.c_int(0))
I got exceptions :
Traceback (most recent call last):
File "D:\BaiduYunDownload\programming\Python\CtypesOCR.py", line 25, in
<module>
tesseractLib.TessPageIteratorBoundingBox(tessPageIterator, iteratorLevel,
ctypes.c_int(0), ctypes.c_int(0), ctypes.c_int(0), ctypes.c_int(0))OSError:
exception: access violation reading 0x00000018
So what's wrong ? The aim of this program is to get bounding rectangle of
each word. I know projects like tesserocr
<https://github.com/sirfz/tesserocr> and PyOCR
<https://gitlab.gnome.org/World/OpenPaperwork/pyocr>
P.S. Specifying the required argument types (function prototypes) for the
DLL functions doesn't matter here. One could uncoment the commented lines
and comment the last three lines to test it.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/6af57275-1518-4096-b640-641a69fa1398%40googlegroups.com.